Constants Handling

Constants/Calibration Handling

Proposal: Use a relational database, in particular MySQL, for persistent storage of this non-event data.
Response summary: Appears to be a reasonably good fit to the requirements.

Characterization of data to be stored (structure)
Characterization of data to be stored (access)
Possible table layout
Features of MySQL
Interface to other software components
Why another persistent format?
Evolution
To do

Data to Be Stored (Structure)

Data types to be handled by this mechanism will include

calibration constants, such as such as pedestals and gains for the calorimeter
channel status (e.g. hot or dead strips)
(perhaps) alignment information

Other kinds of constants might be kept in this database if convenient.

All of these types of data share certain characteristics. The information typically results from some measurement procedure. The procedure could change over time. The measurements themselves may have associated errors or other quality indicators. A particular collection of constants is valid over some time interval only.

Roughly speaking, there are three levels of information. From most specific to most general, these are:

Per-calibrated-object information (e.g., pedestal and rms of measurement for calorimeter adc)
Per-measurement information (e.g., time interval of validity; detector configuration)
Per-procedure-type information (e.g., kind of object being calibrated)

Data to Be Stored (Access)

Our access requirements are relatively simple in at least two respects:

most clients will need only read access.
there will be little if any reason to change data once it is put into the database

Only those generating new calibration data sets will need write access. Ultimately there will be standard automated production programs/procedures for generating these data sets; "regular" users will not need write access. Once calibration data is stored in the database, assuming we don't run into resource shortage problems, there will be no reason to delete it. There will probably be a field somewhere with the meaning "this calibration is the official, blessed one for its period of validity" which might be modified if, for example, someone finds a bug in the procedure generating the calibration data.

Possible Table Layout

Note: This section is primarily for my own use as a working document, though it also does serve to demonstrate that MySQL (actually, any relational database) is adequate for our purposes. The layout described below is not necessarily optimal and not likely to be precisely what gets implemented. Anyone not interested in details of implementation (probably almost everyone) should feel free to skip it.

There will be three primary table types. The first is a small, relatively static table with one row per calibration procedure. For example, there would be a row for a procedure to compute calorimeter pedestals. If, over time, we develop a new procedure for this function, a new row would be added to the table. Such a table might look like this (Don't take entries too literally; they don't yet correspond to real calibration procedures.)

Ix	Function	Method	Det	Scope	DataPer	Cuts	Status	Version
1	Pedestal	Mean	T1	CAL	ADC	TKRHits	superseded	1.0
2	Pedestal	Gaussian	T1	CAL	ADC	TKRHits	production	1.0
3	Elec_Gain	Quadr2	T1	CAL	ADC	?	production	1.0
4	CrystalMap	?	T1	CAL	Crystal	?	production	1.0
5	Status_Hot	Threshold	T1	TKR	Strip	Energy	superseded	1.0
6	Status_Hot	Threshold	T1	TKR	Strip	Energy	production	1.1

The results of a particular calibration would be kept in corresponding rows of two other tables (a set for each distinct Function/Method/Det triple above), one for summary information and another for the actual values. The idea is that most clients would want to search for calibration constants appropriate for some event data, so they need a quick way to look it up by timestamp, and perhaps also by procedure type or a few other as-yet-to-be-determined keys. The summary table would be designed to facilitate this process and would have an index to the appropriate row of the parallel data table. For pedestals computed by fitting a gaussian, the summary table could look something like this:

Ix	ValidStart	ValidEnd	Input	NInput	ProcIx	DataIx	Status
1	1/3/00/10:17	1/3/00/12:15	Raw300	10132	2	1	0x401
2	0/0/00/00:00	0/0/00/00:00	Raw302	0	2	-1	0x000
3	1/3/00/15:01	1/3/00/17:21	Raw302	14705	2	2	0x401

The corresponding data table would have very long rows which would store the output of the procedure and very little else, perhaps starting like this:

Ix	SummIx	Ped1	Chi1	Ped2	Chi2	Ped3	Chi3
1	1	195.0	1.11	203.4	1.13	184.3	1.11
2	3	195.1	1.10	199.1	1.89	184.0	1.12

No doubt there will be a need for other types of tables, for example look-aside tables mapping strings to indices to improve access efficiency and a table to contain fundamental parameters for different detector models.

Features of MySQL

What is it? Quoting from the manual,

"MySQL is a true multi-user, multi-threaded SQL database server. SQL (structured Query Language) is the most popular and standardized database language in the world. MySQL is a client/server implementation that consists of a server daemon mysqld and many different client programs and libraries....

"The main goals of MySQL are speed, robustness and ease of use. MySQL was originally developed because we needed a SQL server that could handle very large databases an order of magnitude faster than what any database vendor could offer to us on inexpensive hardware....

For more information see the website http://www.mysql.com.

Platforms: In order to be useful to us, MySQL needs to run on a variety of platforms, including at a minimum Solaris, Linux and NT. In fact MySQL not only runs on all these platforms and on many others, but these three appear to be the most popular and best-supported.

Cost: It's free!

Documentation: Copious. The 500 page manual is available in a variety of formats and is reasonably well-written.

APIs: In addition to a standard SQL interactive interface MySQL comes with a c-callable library which may be linked into client programs. APIs for C++, Perl and Java, among others, may also be obtained separately.

Support: Appendix A of the manual is a list of some MySQL users. It's long and impressive; I don't think we need to worry about the product disappearing any time soon.

Standards Compatibility: There are some more or less standard features (e.g., sub SELECTs) not implemented in MySQL. The most interesting ones are on their to-do list, but in any case I don't believe there are any in this category that we can't live without. MySQL also includes a number of extensions; we should probably avoid using them if we comfortably can.

More to go for this section?

Interface to Other Software Components

The constants data probably needs to be accessible from multiple environments including Root, Gaudi and IDL. In Gaudi language this means providing one or more converters to handle the translation between database reprentation of a calibration and the in-memory representation needed for, e.g., an analysis program. Most of the work will be in coming up with a suitable design for the in-memory classes. If this is done correctly the same classes should be usable from both Root and Gaudi;any further translation needed for IDL should be relatively straightforward.

Why Another Persistent Format?

We already are committed to using at least Root IO, FITS, and XML for various phases of the ground software, so we should think twice about supporting yet another persistent format. However, it does appear that none of these is as good a fit for constants data as a relational database. Using MySQL for constants provides

a reasonably natural way to structure the data
tools for metadata (data description)
data integrity and access control
efficiency of access and storage

Evolution

The database structure and program interface, such as converters should be designed with some care in order to support constants for multiple detector designs (testbeam prototype, one or more simulation models,...) and to handle changes in procedures, parameters, etc., over time.

More TBW?

To do

Finish first pass at table design for 1 or 2 kinds of constants.
Test client access from Linux, Solaris and NT; server is (for now) installed on Linux.
Get some feeling for resources required: disk space, cpu horsepower,...
Where should the production server live? Do we want mirrors?
Design c++ classes, suitable for use in Gaudi or Root environment, for different sorts of constants data. Write converters.
Investigate interface for IDL.
Test dump and restore as a way to copy constants from the "real" database to a private database on somebody's laptop so that code development and data analysis can be done without network access to the main server.
Is the list of kinds of things to be stored in the first section complete?

More TBW?