Database Code Review (Part II)

Attendees:  Joanne Bogart, Dave Davis, Richard Dubois, Yasushi Ikebe, Heather Kelly, Matt Langston, Julie McEnery, Chunhui Pan, James Peachey, Bob Schaefer, Tom Stephens

This is part II of the continuing saga that is the Database code review.

We started by going over a few comments Heather had sent out back in December:

Tom stated that he was unsure if GlastPolicy sets cflags - but if it does not - seems it is not a big deal as this seems to be the only example of straight C code in the repository. {Actually the OnboardFilter package also contains straight C - as Joanne later pointed out}  As for the package names - Tom mentioned that he has a new name (D1EventDb?) that he will introduce in an upcoming update.

Bob responded by stating that the update schedule could be dependent upon the LAT's plans for re-using this code.  If the LAT wants to install these code and use it, they would be open to utilizing the SLAC CVS repository as the working repository for development.  Richard responded that for now there are no plans to use this code and while some of our European collaborators may intend to install their own mirrors of the FITS file access - it is not imminent yet.  {ed. note.  So it would seem that installing interface packages for mpick and htmIndex is not a hot item}

We then proceeded to the code walk throughs, starting with the main routine in the QueueManager - called D1QM in v1, now called QueueManager.  The code is located at:
http://glast.gsfc.nasa.gov/ssc/dev/QManager/QueueManager_8cxx.html

Chunhui walked us through the main routine, pointing out what was going on:
Line 45:  Creates a list of socket descriptors
Line 46:  Creates the ServerMap - maps the public attributes of the server:  server name, hostname
Line 47: Creates a list of query ids to be removed - when they are finished or failed
Lines 48 & 49:  Create the Message lists
Line 50: 
Line 52-53:  Defines the Priority Queues
Line 54:  Map of queries:  id, status, time
Lines 62-73:  Start up listener socket
Lines 75-122:  Server socket for D1&D2 stager and servers
Lines 132-372:  Read messages
Lines 143-150:  Handles new connection
Lines 151-231:  Queries status from servers
Lines 233-273:  Handles client messages
Lines 299-306:  Handle D1 query - store id and priority number
Lines 308-314:  Same thing for D2 queries
Lines 316-320:  Handles queries for both D1 and D2
Lines 326-330:  Handles invalid query
Lines 337-373:  Handle status messages
Lines 447-466:  Check for server timeout

We stopped due to time and went around for comments.

Heather suggested that the main routine be broken into subroutines - a thought echoed by James, Joanne and Matt.  James pointed out that there were hard-coded constants - Chunhui is in the process of replacing those with symbolic constants.  Joanne mentioned that it is best not to name a data structure after its implementation - such as ServerMap, which describes the type - one may later decide to modify the implementation.  Matt stated that it was great to see exceptions used and suggested that the whole thing be enclosed in a try catch block.  James wondered about places where the exception is caught but processing continued - was that the intended behavior?  If so, there should be a comment to that effect.

Chunhui mentioned that she is working on a class that will contain methods that will implement the routines of the main program into separate components.

We then went on to look at the D1 Server, starting with D1Server.c. http://glast.gsfc.nasa.gov/ssc/dev/databases/doxygen/   Tom stated that D1 and D2 servers are basically the same - though D1 is more developed.  The code is in C and can run in parallel using MPI.

Line 174:  Set up default directories
Line 197:  parse the config file to read in the ports
Line 200:  Can override the port with a command line parameter
Now in client/server mode
Line 207:  Checks to see if this is the control node
Control node code from 207-483.  Sets up the htmIndex - only done by control node.
216-234:  Read in metadata
246:  Open ports
Should also chop this main into subroutines and there are plan to utilize Chunhui's socket class.
Line 334:  Set up ports
Lines 389-403:  Handle console input
Lines 404-426:  Handle new connections.  Multiple connections are possible, which allows for testing
Line 445: Process Messages
Starting from line 484 is the client side
Checks for ingest command, or re-ingest.  
Line 588: Does search

We then looked at the ProcessMsg routine in D1ServerFunctions starting on line 263.  If this is a new query, check to see if the query is "good" or "bad".  If bad, send a message back to the QueueManager.  If good, send a query received message then a query started message, immediately after.  Assumes that nothing bad happens between the two.
Nothing currently happens for Query results received or not received messages.
Default case clears the socket, reads the message, gets the message size and reports that an unknown message was received.

Finally we peeked at the StartQuery routine which begins on line 735.
Lines 765-829 Builds the cfitsio row filter command - except for area searches.
Line 832 Broadcast to nodes
Line 834 Builds list of all files to be searched
Lines 840-854 Divides number of files to be searched among the processors running.
Lines 867-880:  Sends filenames to each nodes and then waits
Lines 885-897:  Receives results from clients.  The order in which clients return does not matter.  Counts the number of files processed to be sure all were done.

We were then done walking through code... and it was comment time again.
It was asked if the D1 and D2 servers could share more code since they do pretty much the same thing.  Tom answered yes.. the only reason they are different at all is due to the different keywords in the selectRows routine. Stager and Servers currently share their common functionality in the generalFunctions file.

Tom mentioned his benchmarking results running on 5 nodes searching a year's worth of data in less than a minute.  

F. Lastname Last Modified:  2004-08-04 15:40:24 -0700