There are two critical areas to our operation: simulation/reconstruction and the Data Processing Facility, Without these two, data will fall on the floor (though not be lost). the other areas - Science Tools and communication of data/algorithms to the SSC, and user support and basic analysis tools - are important but can be level of effort. All of our activities are driven by available manpower.
While there is still work to do on the simulation and reconstruction, we believe that an early start has given us a mature system, for which we know the directions to take, and should have them implemented within a year from now. There are no real issues in this area. [mention G4 here?]
The Data Processing Facility builds on the experience of previous particle physics experiments at SLAC, and is, in fact, a simpler problem due to the relatively small data rates and availability of inexpensive computing hardware. Again, no technical difficulties are foreseen in this area. It will be manpower driven, and we are aiming to have a good working prototype for CDR preparations.
need to see section on calibration first!
Issues
There are two major issues involved with the Level 1 data, which will be input to the higher level analyses.:
the form of storage of the data, and how it will perform for the expected types of data requests (temporal and spatial)
What format should the data have that are delivered to the SSC? This is not really an open issue, as NASA will mandate FITS format. The open issue is actually regarding the formatting of the data within the LAT team.
access to this data: is it expected that most of this data will be accessed from a central location, like SLAC, or copied to all or some home institutions
The disposition of the expected vast volume of MC simulations is still to be determined. MC runs of GlastSim will be used extensively to define the high-level calibration of the LAT. MC runs might also be used for specific scientific studies of flight data. The simulations should be preserved but are not likely to be widely accessed.
Mitigation
Issues
Currently, two analysis packages are being supported: IDL and Root. IDL is a commercial package in wide use in the astronomical community; Root is a new product out of HEP and is becoming the standard there.
Attribute IDL Root cost expensive (O($2k)/seat free user community astro HEP suitable for Level 0, 1 data analysis not designed for it designed for it suitable for envisioned Level 2 data analysis designed for it not designed for it There are three main problems with supporting two packages:
extra effort in maintaining two systems
division within the collaboration when developing useful tools and analysis macros: these cannot be directly shared between packages
suitability of Root for the Level 2 analysis: will the bulk of this analysis, which in time will be the lion's share of all analysis, be in IDL?
The factors in favour of two packages are:
it seems unlikely that either user group will abandon their favoured package
use of Root is mushrooming and starting to be noticed in the astro community. It is possible that Root will become a standard there by the time of launch and it will have been good for us to have stayed current with it. It is also plausible that Root will acquire suitability for Level 2 analysis in the not too distant future as more astronomers get on board and import the functionality needed.
Mitigation
This issue has two aspects.
The first is with what detail will we specify them; potentially the
effective area Aeff, energy resolution, and point-spread function (PSF) could be
described as functions of energy, azimuth, inclination, plane of conversion in
the TKR or layer of conversion in the CAL, tower of conversion, etc.
We need to find out both the practical limit for determining the IRFs
from Monte Carlo as well as the point of diminishing returns in terms of science
analysis. For example, we may find
that no practical scientific gain would be realized by having the instrument
response functions defined separately for each tower.
The other aspect to this issue is determining the 'standard' background rejection/PSF enhancement cuts that will be used to select events for high-level analysis. More than one set of cuts will be used, depending on the particular science analysis goal. Probably at a minimum we would have three sets - one useful for GRBs (maximizes Aeff, with background rejection and PSF not so important), another for low-latitude point sources (PSF tails minimized, Aeff and background rejection not as important), and a third for general analysis (background rejection important, PSF and Aeff optimized in a reasonable compromise). For special applications, like very-high energy resolution spectroscopy using wide-angle events in the calorimeter, we may even define additional sets of cuts. We will undoubtedly refine the cuts for each set after launch, but a core set needs to be defined in advance from ground-based calibration and MC simulations. Deriving IRFs for a given set of cuts is a lot of work, and the cuts must be optimized selected carefully. [Where does this go in the schedule? It is an issue that could belong both to instrument simulation and science analysis.]
As described in the Science Tools section, the analysis of
high-energy gamma-ray astronomy data is fundamentally model fitting, owing to
the limited numbers of photons and the limited angular resolution of the
measurements. Model fitting can be
used to detect point sources or analyze source spectra or extended sources,
depending on how the model is defined. The
likelihood function, which defines the likelihood of the data given the model,
may be used to determine confidence ranges for parameters and to distinguish
between different source models. For
EGRET and earlier missions in high-energy gamma-ray astronomy, the likelihood
function was evaluated by binning the photon data (on the sky and in energy) and
comparing the number of gamma rays observed to the number predicted in each bin.
The coarser the binning, the less discriminating the likelihood function
can be, because in evaluating the predicted numbers of photons the instrument
response functions are effectively averaged over the bin.
In principle, the unbinned limit (for which the bins are so small that
they contain at most 1 photon) maximizes the information usage from the data.
In practice, though, unbinned analysis has not been applied extensively
because it is more computationally intensive and less stable numerically.
The issue is to decide whether to use unbinned or binned likelihood functions for the routine analysis, and if binned, then what binning. For the LAT, which will be changing its pointing much of the time, binning (in instrument coordinates) must be done very carefully to avoid loss of information from mixing photons with near-axis arrival directions from those far off axis, which generally have less sensitive instrument response. Preliminary indications are that binned analysis can fairly rapidly approach the sensitivity of unbinned analysis if the binning is judicious (e.g., with bins small enough to that the instrument response functions do not vary appreciably within any bin). Ultimately, after the relative performance of the analysis using the two likelihood functions has been established, perhaps both forms of the likelihood function may be implemented, one for speed and the other for maximum sensitivity.
Another important issue is how we will quickly decide whether a transient (AGN flare or non-triggered GRB) is captured in the most recent data dumps. This is for transients that are not bright enough or brief enough to be noticed onboard. The algorithms to trigger an alert (or followup analysis) need to be defined. They may be traditional likelihood analysis as described above, or perhaps something faster will be needed, such as a search for clusters of photons in direction and time or a wavelet filtering of the data to reveal the positions of potential point sources. A related question is how detections in the current sky map are matched against the accumulating point source catalog to decide whether a source is newly detected and/or flaring.
The issue is which environment to adopt.
At a minimum the environment provides a shell for accessing the
data and running the high-level analysis software.
It should have GUI and command line interfaces.
It should be scriptable and closely coupled with image display and
plotting package. Existing
environments under consideration are Root and the core of the CIAO (Chandra
Interactive Analysis of Observations environment).
Both are widely used, although with different constituencies.
They are well supported and freely distributable.
A related issue is the communications between the analysis environment and the Analysis Interface layer, which serves data for higher-level analysis. The analysis environment will query the analysis interface for data, exposure, calibration, and interstellar emission model information. The form of what the server returns needs to be established, along with the practical limits of the system in terms of retrieval speed by the Analysis Interface layer and the volume of data transferred.
During the GI phase of the mission (years 2 and beyond),
data awarded to GIs will be proprietary to the GIs for 3 months.
During this time the SSC will have to restrict access to these data
(primarily by region of the sky and time range).
The LAT team will have some ongoing processing rights to the entire
dataset (e.g., to search for transients and compile a source catalog.
For other uses, though, the LAT team will have to respect the proprietary
data rights of the GIs. The open
issues regarding this include how the proprietary data protections are
implemented. The SSC is nominally
responsible for scheduling observations, and may be responsible for tracking
data rights. However, this would
imply that database mirroring would be two-way (SAS-SSC and SSC-SAS).
Should the Level 0 data be delivered to the SSC by the IOC
(where it first arrives at the LAT team) or by the SAS (where it is turned into
Level 1 and higher-level data)?
Issue
We do not have sufficient manpower at present to maintain adequate user documentation and to answer questions. This is usually an area that is under-funded. Answering user queries (especially when documentation is scanty) can pose a serious drain to the members of the software team.
Mitigation
The SLD experiment pioneered a "User Workbook", which was a concerted effort to lead a new user progressively through all the tools and techniques he would need to work the system. The BABAR experiment followed up on this idea and also created such a workbook. It required one dedicated person for about a year to set up the ideas and recruit a couple of assistants (often graduate students and post-docs) to write the documentation. It requires effort as the system evolves and new features are added or old ones changed, but experience showed it needed perhaps 1/4 FTE after the initial work was complete. It made a huge difference to the SLD software team, significantly lowering their interrupt rates, and it was much easier for new users to come up to speed.
SLD: http://www-sld.slac.stanford.edu/sldwww/workbook/workbook_prod.html
BABAR: http://www.slac.stanford.edu/BFROOT/www/doc/workbook/workbook.html
Issues
Eventually some 25 FTEs will be required, with the bulk of the effort going into the Science Tools. Clearly this will involve a build-up of staffing from our current levels. This build-up is indicated in this figure [needs fixing], which shows SLAC + non-assigned effort. SLAC is supplying about 5 FTEs now.
Mitigation
The Science Tools are the least critical: descoping, ie delay in this area would be our response. If no explicit programmer budget appears, we will have to tap the scientific personnel in the collaboration. A second strategy would be to delay the DPF, if, for example, the NASA money at HEPL doesn't come through.
R.Dubois Last Modified: 07/29/2001 19:21