Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | Directories | File List | Class Members | File Members | Related Pages

DataSource

A data source for the HippoDraw analysis package is typically a table of numbers with a limited number of columns and probably large number of rows.

It is commonly called in some circles a NTuple. A computer scientist, however, would call each row a n-tuple. In addition to the data, the NTuple has a title, and labels for each column. To uniquely identify an NTuple object in an application, it also has a unique name. The name will be a filename with the full path, if associated with a file, or some unique string if the NTuple is only in memory.

The Data Source Classes

A data source or NTuple can be represented internally in HippoDraw in a number of different ways. Each is implemented in a C++ class derived from an abstract base class. They are described in the following sections.

The DataSource class

The DataSource abstract class provides the interface to the data as well as the title, labels, and name. Several derived classes of DataSource manage the access to the data. They differ on how they store the column data. All derived classes of ProjectorBase use the DataSource interface to create a projection of the data for plotting.

NTuple class

The NTuple class is a derived class of DataSource that manages the data by containing a vector of double floating point numbers for each column. This class provides the most efficient access to the data. However, all the data is always contained in memory so if one's data set has a very large number of columns and rows it could consume lots of memory and cause the computer to do a lot of swapping.

If the contents of a column is changed, the changes will be reflected in any displays using that column automatically. In same cases, re-displaying with every change might be too often, such in a data acquisition system. One can use the interval counter feature of the NTuple class to set the updating to every n-th change.

CircularBuffer class

The CircularBuffer class is a derived class of NTuple that works like a circular buffer. That is, one sets a size for the maximum number of rows, then fills the buffer by adding rows. When the maximum size is reached, the first row is replaced, then the second, etc, until the last row is reach. Then the process repeats itself.

ListTuple class

The ListTuple is a derived class of DataSource that manages the data by containing references to a Python list objects. No copy of the data is made. An empty ListTuple can be created from Python and columns of data can be added.

If the data contained by the Python list changes, they will be reflected in any displays using that column once HippoDraw has been notified changes have been made. This is not automatic since Python list objects to not emit any notification message.

NumArrayTuple class

The NumArrayTuple is a derived class of DataSource that manages the data by containing references to Python numarray array objects. No copy of the data is made. With this release, only rank 1 array objects is supported. As with the ListTuple class, HippoDraw needs to be notified if the data changes before it will reflected in and displays using it.

RootNTuple class

The RootNTuple class is a derived class of DataSource that manages its data by using ROOT to read data from a file. This class is only available if HippoDraw was configured to Build with ROOT support.

Only ROOT files whose TTree objects contain TBranch objects with only one TLeaf is supported. This is a fairly common practice. If more than one TTree is in the file, then a dialog will appear on which one can select the desired TTree.

When the ROOT file is opened, the names of the TBranch objects is used as column labels, but no data is read. As a column is used, a copy of the data for that column is made.

If the TLeaf is a multiple dimension array, a new NTuple is created, which each column representing an element of the array.

HippoDraw also provided a Python interface to the ROOT files. See Using ROOT files for an example.

FitsNTuple class

The FitsNTuple a class derived from DataSource that manages its data by reading a FITS file with ASCII or binary tables as well as images. One can read also FITS file by using the FitsController from the Canvas Window or from Python. One can also use PyFITS Python extension module in conjunction with numarray.

The DataArray Python class

Not a derived class of DataSource but the DataArray class appears as one to Python. It is implemented as the DataSource C++ class. The DataArray class wraps any of the concrete DataSource derived class and provided a direct interface for use of numeric Python arrays for both input and output. In Python, a DataArray behaves like a Python list when used with an integer index, and a Python dictionary when used with column labels.

Reading and writing NTuple data set.

For application writers, the NTupleController class provides methods for reading and writing files in the HippoDraw ASCII format. See NTupleController::createNTuple( const std::string & ) and NTupleController::writeNTupleToFile( NTuple *, const std::string & ). On writing, any of the forms of DataSource data sets can be used. On reading, the NTuple class is used.

These functions are used by menu items on the CanvasWindow.

They are also available to the Python extension module although it is easy enough to handle this format directly in Python. From Python one can also add new columns, and replace the data in existing columns. In each case, a copy of the data is made. One can also add to the data set by adding a row.

ASCII file format for NTuple

The HippoDraw application can read and write NTuple data to an plain text file. The format is quite simple. Here is the contents of such a file.

Mark II Z0 scan
Energy	Sigma	binsize	error
 90.73999786  29  0.25999999  5.9000001 
 91.05999756  30  0.23  3.1500001 
 91.43000031  28.39999962  0.25999999  3 
 91.5  28.79999924  0.28999999  5.80000019 
 92.16000366  21.95000076  0.22  7.9000001 
 92.22000122  22.89999962  0.25  3.0999999 
 92.95999908  13.5  0.20999999  4.5999999 
 89.23999786  4.5  0.28  3.5 
 89.98000336  10.80000019  0.27000001  4.5999999 
 90.34999847  24.20000076  0.25999999  3.5999999 

The first line is the title. It can contain any number of spaces and is terminated by the new line character. The second line contains the labels. You can't see it but they are separated by the tab character to allow blanks to be in the labels. The remaining lines are the data, row by row. Any white space can separate the data.


Generated on Wed Sep 7 14:52:01 2005 for SiHippo by  doxygen 1.4.3