The average event rate in the telemetry is expected to be 30 Hz, split between signal photons and background cosmic rays. Our current reconstruction algorithm consumes about 0.1 s per event on a 400 MHz Solaris processor. Assuming processors at 4 GHz by launch time (a conservative estimate), then this rate drops to 0.01 s/event, allowing a single processor to keep up with incoming data on a daily basis. If we wished to turn a full day's downlink around within 4 hours, we would require perhaps 3-5 processors. The gist of the message is that disk and CPU time are not drivers for GLAST's Level 1 analysis.
These disk and CPU needs represent perhaps a percent of SLAC's Computing Center capacity, so even a gross under-estimation of rates and volumes is easily accommodated within the existing facility.
One can inflate these estimates by requiring the capacity to re-process data, and perhaps generate Monte Carlo simulations, concurrently with prompt processing. An estimate of the maximum computation and storage capacity required is perhaps a few 10's of processors and 50 TB of disk over the life of the mission. The SLAC Computing Center is committed to supplying, at no explicit expense to GLAST, these disk and CPU resources. Needs confirming by Richard Mount!
The task at hand will be to have a sensible backup scheme for the data, and a well designed database which can handle the state of the processing (for all of prompt, re-, and MC generation) and description of the resulting datasets. The database will be the heart of the operation. From it, a fully automated server can completely handle the data processing, with a minimum of human intervention.
Such a database is being prototyped for GLAST use, based upon experience from a similar data pipeline used for the SLD experiment at SLAC. An entity relationship diagram is shown here. This database is designed for use in the engineering model tests as well as for flight mode. The tables are divided into three categories:
The server's life is considerably ameliorated by having all datasets on disk all the time. There is no urgency for backups, and the server does not even need to be responsible for doing the backups [does database need to know where backups are?].
Questions of programming language and database technology are somewhat interconnected. A mainstream interpretive language like Perl is a good match to this kind of work. The database is assumed to be relational, and sql-based. SLAC has an Oracle site-licence, so that seems like a natural choice. Perl has a good interface to Oracle, so that combination of Perl and Oracle will be a good match to the needs.
Since datasets are independent, the server can make use of a load balancing batch system (SLAC uses the LSF batch system) to handle dispatching the processing jobs. So, assuming the Level 0 data are broken into small chunks (the unix filesystem already limits files to 2 GB), then the server can submit all the chunks to separate processors to achieve parallel throughput. Each process can then communicate its results directly to the database or to the server which would do the updates.
We will also need web interfaces to the server, both for watching its progress and for interacting with it. These interactions will involve both direction communications with the server (restarts, etc) and with the database (eg altering the state of a dataset to set an OK flag to resume processing).
Examples are
The issue involved is to record the metadata that is unique to MC: the source generator, its parameters, and the configuration and parameters of the simulation package. These are readily handled by the flexible metadata scheme in the database. The code management system also makes code version identification unambiguous.
As well as performing automated processing to Level 1, the data manager, in combination with the database, will provide the logic that allows users to access data sets with similar properties as a group. The data manager will work in tandem with the code management system to provide extensive version information on processing algorithms used for each stage of processing a given data set.
The current version of the prototype is able to generate MC data and
run various versions of the reconstruction code on it and will soon have
the capability of logging metadata on the simulation and reconstruction
specific algorithms to the previously describe Oracle database. A block
diagram providing a more detailed view of the interaction of the Data Manager
with various SAS components is here
.
R.Dubois, K.Young Last Modified: 07/26/2001 15:10