Database Code Review

Database Code Review (Part I) Minutes

We decided that part one of the this review would focus on the presentation so that all involved would understand the purpose and background of the science tool package currently named Database.

Bob started with an overview of the history of D1, D2, U1, and U2 - without the aid of slides! The Database is responsible for reconstructed event data form the LAT and GBM. This data is to be made available to the public. There are two types of data:

LAT Event Summary Database = D1
Actually D1 refers to two databases: Photon database and All Reconstructed Events
For DC1 - we will only have photons.
Julie asked if this includes albedo photons - Bob explained that they will take whatever the LAT provides. Richard chimed in that he was unaware of any albedo gamma cuts.

Spacecraft data = D2 which includes pointing, livetime and mode history.

Each database has a data extractor which is the interface to the database. U1 provides web access. U2 provides a separate tool that can be run locally on the users machine to fine tune cuts.

Next question was how to maximize search performance. Pat Nolan had performed a trade study and found that the fastest search utilized FITS files using CFITSIO. This even beat out Quad Tree searches. Pat's report is available on the Database Wiki. Due to the use of FITS - we would have to create our own code for searching.

The QueueManager (QM) handles queries and keeps track of what is going on. Handles queries for both D1 and D2. All web access goes through the QM.

They defined a set of messages for passing between the various components:
http://glast.gsfc.nasa.gov/ssc/dev/db_utils/messages.html
[This page is likely out of date as far as denoting what is currently implemented]

Tom proceeded with his first talk. He noted that in his diagram of the components, grey arrows denote paths that are not yet implemented. How does it work?
User enters their query, the QM queues the query and when the time comes, sends it to the server. The stager merges all of the data into FITS and stores the file on the FTP disk. The QM then informs the web client that it is finished - which tells the user.

Here is the web interface, which is now live:
http://glast.gsfc.nasa.gov/cgi-bin/ssc/U1/D1WebQuery.cgi
Currently only simulated data is available. Currently only decimal values are accepted, they are waiting for the system administrators to install the module to handle integers. Currently the Database handles one search at a time. The queries are logged using a query id. The search criteria are stored in the FITS header of the output file.

Julie asked, does the stager always create one or two merged files, does it create files if the data size is over 2 GB? Tom answered that larger data sets have not yet been addressed.

We can try out the web interface - here is a suggested query:
Crab Nebula: RA 83.6 DEC 22.01
The default search size is 15 degrees. Starting and Ending time are optional.

Matt asked how the output FITS file names are created. Tom answered that the names are cooked up using some combination of date and time of the query.

Chunhui presented her talk concerning the QueueManager. We went through the requirements that led to QM's development. Chunhui was also prepared with code snippets for review, but we decided to hold off on that until the second part of the review where we will actually focus on the code.

Tom then went through his second talk.

Joanne asked whether it was really necessary for the stager to return the events in time order - since that appears to be the main bottleneck. Tom replied that currently Goodi expects to received events in time order, which is the reason the stager provides it. Bob quickly added that we really need to think about this to see if we really need time ordered events or not...[cannot recall the reason - but I'm sure it was good]

Search times:
D2 takes about 7 seconds to sift through a half year of data.
D1 searching is a bit slower as it depends on what part of the sky we are searching. Using HTM takes 2 to 20 minutes to search one year of data. Where 2 corresponds to a search high off the Galactic plane and 20 minutes would be in the Galactic center. Tom is currently performing time trials without HTM, and is finding that the times are rather flat, about 15 minutes.

Tom stated that it takes about 40 seconds to ingest a day's worth of data. Richard offered a test set that would be ready by early next week, if they were interested in trying it out.

Joanne also asked about the bit of dropping duplicate GTI entries. GTI is Good Time Interval. Tom went on to explain that this is an artifact of breaking up the data. It is then up to the stager to clean up any duplicate entries. [again..there was more to this explanation that is clearly missing from these minutes]

F. Lastname Last Modified: 2004-08-04 15:39:47 -0700