Draft, 13 July 2001
Design of high-level
analysis software for the LAT
The design for the high-level
analysis software, i.e., the software for analysis of gamma-ray data after
reconstruction and background filtering of events, is driven by three principal
considerations:
1. High-level analysis of GLAST
data will be fundamentally statistical; limited numbers of photons and
relatively poor angular resolution mean that quantitative analysis will be via
model fitting.
2. The characterization of the LAT
is complicated and compounded by a scanning observation mode, a large FOV, and
the need to reject the intense background of cosmic rays as well as albedo
gamma rays from the earth's limb. The
PSF, effective areas, and energy resolution depend on energy and direction of
the gamma ray, location of the conversion in the LAT, and on the background
rejection cuts employed.
3. After events are reconstructed,
data access will be principally by direction on the sky and time range. (For cosmic rays used in monitoring
calibration, access will be principally by direction in instrument coordinates
and time range; see Calibration section.)
The data analysis system must support efficient spatial access.
Interstellar emission model
Consideration (1) implies that a
model of the direction and energy dependence of the interstellar emission of
the Milky Way is needed for the high-level analysis. This emission is from cosmic-ray collisions with interstellar gas
and photons and it is present in any direction on the sky although most intense
from directions close to the plane of the Galaxy. More than 60% of the celestial gamma rays detected by EGRET were
interstellar emission. The emission can
have structure on fine angular scales, and an accurate model will be useful for
distinguishing low-latitude point sources from unresolved diffuse emission and
for accurately determining positions and fluxes of point sources. (In addition, an accurate model of
interstellar emission at high latitudes will be essential for estimating the
truly diffuse extragalactic emission.)
The development of models of
interstellar emission is fairly well understood after more than two decades of
application in high-energy gamma-ray astronomy. Advances that we will take advantage of for the LAT include
higher-resolution surveys of the interstellar medium than were available for
EGRET analysis, and modern calculations of cosmic-ray production and
propagation that include constraints from cosmic-ray isotope abundance ratios
and other local measurements.
Likelihood analysis
Also regarding (1), model fitting
in high-energy gamma-ray astronomy has long used the likelihood function as the
measure of goodness of fit (e.g., Pollock et al. 1981). Variations of the likelihood function (which
defines the likelihood of the data given the model) with respect to the various
parameters of the model can be used to quantitatively determine confidence
ranges.
For EGRET and preceeding missions,
the likelihood analysis was based on binned maps of photons, i.e., by comparing
the predicted and observed numbers of photons in bins of energy and region of
the sky. Information is lost in binning, and in principle the most sensitive
analyses can be performed using unbinned implementations of the likelihood
function, where the contribution to the likelihood function of each photon is
treated individually, using the response functions that apply to that
photon. Unbinned analysis is much more
computationally intensive, and is less well-behaved numerically, with results
often being the small difference between two large numbers.
Regarding consideration (2), for
GLAST, we intend to perform a trade study on the degree of binning acceptable
(or maybe to implement both analysis options).
Regarding binned likelihood analysis, fine binning in energy and
inclination angle are likely most important to limiting the loss of
information.
Exposure calculation
The calculation of instrumental
exposure is fundamental to obtaining calibrated fluxes and spectra. The exposure is a function of time range,
energy and direction on the sky. It
also depends on the spacecraft position and orientation, because directions
near and below the earth's limb must be excluded. Exposure calculations, complicated as they are, must be performed
rapidly in order to support the multiple all-sky analyses that will be
undertaken daily. Our implementation of
the LAT analysis software includes an optimized algorithm that can quickly and
accurately generate exposure matrices by factoring the problem. Much of the calculation is in accumulating
livetimes, and is independent of instrumental response functions. These accumulations can be made quickly on a
predefined (sufficiently fine) grid on the sky.
The volume of Level 1 data will be
too great (1–2 Tbyte/yr), and searching the data too computationally intensive,
for the entire dataset to be distributed to each LAT investigator or guest
investigator. The Level 1 and
associated databases for high-level analysis (see below) will be accessed via
server computers at a few sites. These
sites are envisioned to be the LAT IOC, remote analysis sites of coinvestigator
institutions, and the GLAST SSC. High-level
analysis modules will be run on client computers, not necessarily colocated
with the servers, that query the servers for data. This division obviates the need to distribute the whole LAT data
set as part of the analysis environment, spreads the overall computational load
for analysis, and enables a single analysis environment to be supported across
the collaboration and within the SSC.
(The LAT team is required to produce an analysis environment that can be
used by outside investigators supported by the SSC.)
The flow of data from Level 0
through the highest levels of processing is diagrammed in Figures 1 and 2. The databases and processing steps for Level
1, i.e., the Event database and higher processing are described in the
subsections below.
|
Figure 1 - Level 0 to Level 1 processing
|
Figure 2 - Post
Level 1 processing
The analysis interface layer
outlined in Figure 2 extracts data, calibration, and emission model information
from the databases and passes it to the higher-level analysis modules. The passing is done via FITS files (TBR).
As an example, consider analyzing
a year’s worth of data for a point source.
Upon receipt of a request for the data for a region of the sky around
the source (for a set of background/PSF cuts, energy range and zenith angle
cuts), the Event Extractor would retrieve high-level information for the
photons. The high-level information
passed back to the client would have the energy, direction, inclination,
azimuth, plane of conversion, quality flags, time for each photon, about 40
bytes minimum for each photon, and approximately 1 million photons (for a 10°
radius selection region). The corresponding
exposure matrix produced by the Exposure Generator would have exposure
tabulated for a grid of energy, direction, inclination, azimuth, plane of
conversion. This could be fairly large,
approx 1000 (ra,dec) x 10 (inclination) x 10 (azimuth) x 10 (energy) = 1
million entries. So 50 Mbyte or so
would have to pass from the server to the client before analysis began. The appropriate instrument response
functions for the time range and event classes selected would be generated by
the IRF server. (TBR. The IRF server would have response functions
for a predefined set of background rejection/PSF enhancement cuts; new cuts
would require new response functions to be generated from the calibration Monte
Carlo events.) The interstellar
emission model for the corresponding region of the sky would probably have to
be requested from the Emission Model Server as well (specifying, e.g., the
coordinate system and binning), but this would be much smaller. In addition, the point-source catalog should
probably also be queried to assist in defining the overall (background +
sources) emission model for the region surrounding the source under study.
The high-level likelihood analysis
of LAT data will have interactive (graphical user interface) and batch (command
line or script driven) modes. Much of
the LAT team's routine analysis of the gamma-ray data will not be
interactive. For example, all-sky
searches for point sources (to flag sources that are flaring) will be made for
short time scales (typically hours), and so will be run many times per day.
The infrastructure of the high-level
science analysis system includes the Analysis Interface Layer described above
(see Fig. 2) and the software and databases needed to provide the services of the
Analysis Interface Layer. In
particular, Exposure calculation, Event Summary generation, High-level
calibration database, and the Diffuse Emission Model are part of the
infrastructure. These modules and
services are the core of the high-level analysis system.
Not explicitly discussed
elsewhere, but essential to the high-level analysis system is a tool for Map
generation and for displaying images and plots. Maps can be generated, e.g., from a list of photons or from an
exposure matrix. Images can be
displayed with full coordinate information, with reprojections if necessary,
and overlays.
All processing steps that produce
image or tabular output will have the capability to write the output in FITS
format (other formats TBD) to facilitate subsequent display or processing outside
of the LAT SAS system.
[What can be reused from Chandra?
Sherpa for model fitting, CALDB, ChIPs for plotting and image
display? DS9? How about HEASARC Xanadu?
Perhaps should state that will be able to provide input to Xanadu in the
format that it expects, Gaudi - a good
idea for implementing the SAS? Even for
relatively standalone tools? Root
environment? What can be reused from
EGRET? Little directly; e.g.,
algorithms for pulsar barycenter corrections.
Skyview?]
Database |
Contents |
Access Criteria |
Used by |
Event |
full info. for each event, including reconstruction
(Level 1 database) |
time or event number |
Event Summ. constructor, event display, low-level calib
monitoring |
Event summary |
energy, direction (celestial and instr. coords), time,
plane/tower/log layer of conversion, event id and bkgnd rej/quality flags |
energy, direction, time range, event flags, event ID |
high-level map generation and analysis, CR event
selection |
High-level calib |
instrument response functions as functions of energy,
angles, plane, time,... |
energy, angles, time, ... |
Exposure gen, high-level analysis |
Exposure
history (timeline) |
S/C position, orientation, LAT mode, and livetime for
regular ~30s time intervals |
time range |
Exposure gen. |
Source sim. |
Monte Carlo equivalent of Level 0 data, perhaps already
as ‘digis’, with truth info, and run/config. ID |
? |
Recon |
Pt. Src.
Detection |
Position, flux, spectral hardness and associated
uncertainties, time range |
coordinates, time range |
Transient Src search, Pt. Src. Catalog Gen. |
Pt. Src.
Catalog |
Summary of Pt. Src. Detection, flux histories and
candidate source IDs |
coordinates, spectral hardness, variability index,.... |
Catalog access interface? |
Pulsar Ephem |
(radio) Timing parameters for known pulsars, contemp.
with GLAST mission |
pulsar name |
Barycenter corrector |
GRB |
? |
? |
? |
Table 1 - High-level
databases.
High-level analysis tasks
The high-level analysis tasks
planned for development are described in Table 2. Most of them derive their inputs from the Analysis Interface
Layer, i.e., all of the inputs that they require are in the Level 1 and
associated data (see Fig. 2). Other
tasks require Level 2 data, i.e., the output of another high-level task. Some of the tasks are related to ancillary
science goals for the LAT and will be developed as level of effort
undertakings.
Name |
Function |
Inputs |
Outputs |
Point-source
detection |
Analyzing a given region of the sky for point sources |
Analysis interface layer |
locations, fluxes, significances, spectrum or spectral hardness); |
Point-source
spectroscopy |
model fitting with flexible definition of spectral
models; possibly developed as part of the general likelihood analysis
capability described below (Extended sources and confused regions) |
Analysis interface layer |
Model coeffs and uncertainties |
Source
variability |
Flare detection (short term, for issuing alerts), pt.
source vs. extended source determination (longer term, for quantifying
variability) |
Point source detection database |
Flux histories, estimates of variability |
Extended
sources and confused regions |
‘custom’ model fitting. Interactive analysis largely will
be model fitting (parametric), allowing flexible specification of source –
multiple point sources, spectral models, arbitrary extended sources |
Analysis interface layer |
Model parameters, confidence ranges |
GRB time
profiles |
Constructs time profiles for user-defined event selection
criteria |
Analysis interface layer (Event Summary) |
Time profile histograms (perhaps normalized by IRFs, with
periods outside FOV indicated), tables of events associated with a burst |
Source
identification |
Quantitatively defining probabilities of associations of
LAT pt. srcs. with srcs. in other astronomical catalogs |
Point source catalog |
Point source catalog |
Pulsar phase
calculation |
Assign pulsar phases to a set of photons based on the
timing params for the pulsar, to allow phase-resolved analysis for most of
the analysis tasks, like spectral meas., and phase binning - for histograms
and maps. |
Analysis interface, Pulsar Ephemerides |
Phase assignments by event number (?) |
Pulsar
periodicity searches |
Searches for pulsations in data for a point source |
Analysis interface* |
Ideally, position, period, period derivative,... |
High-resolution
spectroscopy |
for narrow-line emission at high energies |
Analysis interface |
Line energy, flux, or upper limits |
Inflight
calibration |
monitoring effective area via fluxes of pulsars,
monitoring PSF via phase-selected photon distributions around pulsars.** |
Analysis interface |
Flux histories, PSF profile plots, tables |
* Also may need a tool to display times when target was in
FOV to select intervals with greatest continuous coverage.
** Gains, alignments, hot/dead strips, etc., are part of
the lower-level calibration monitoring described in the Calibration section)
Table 2 - High-level
science analysis tasks.
Other potential analysis tasks
(potentially level of effort):
· Multiple-gamma events - this may be a lower-level analysis
issue - after reconstruction need to define a flag or a set of flags that
indicate multiple pairs of tracks may be present. What would be most interesting is multiple pairs of tracks with
the same apparent arrival direction.
[What would be the approximate rate of multiple gamma events of any kind
- just from closely-spaced arrival times of otherwise unrelated photons? 2.5 Hz avg rate, 20 µs separation?]
· Nonparametric algorithms for detection of point sources and
extended sources without models (either for point sources or interstellar
emission). This includes wavelet
analysis - application for quick detection of transients.
· Polarization of point sources - the measurement will be
hard (possible?), need to measure the plane of the e+/e- pair
Interstellar emission model
The interstellar emission model
will be refined, most likely iteratively, based on LAT observations during the
sky survey. The models for cosmic-ray
production and propagation in particular are most constrained by the gamma-ray
observations themselves. Some aspects
of the EGRET findings, in particular the ‘GeV excess,’ need to be verified and
investigated in more detail with LAT data.
Also, in special directions, the 3-dimensional distribution of
interstellar gas is especially difficult to determine from spectral line
surveys of H I and CO, and models for different distributions consistent with
the radio/mm observations may have to be tested against LAT data.
No particular tool has been
identified for validating and refining the model. The most useful input would likely be a point-source subtracted
map of the sky.
For LAT data analysis, the model
will be precomputed for a grid of directions and energies on a grid finer than
the angular and energy resolution.
There’s no particular advantage to generating the model on the fly for
arbitrary directions and energies. The
nature of the calculation (line of sight integration of the products of
cosmic-ray and interstellar gas or photon densities) makes precomputing the
maps straightforward and efficient.
Observation simulators
Two are needed: low-level (generates events that are passed
through Recon and Bkgnd Rej) and high-level (based on instrument response
functions and the exposure calculator).
The former will be important for developing and testing the SAS system
(mock data challenges) and the latter will be a proposal preparation and
observation planning tool.
Other considerations
The high-level analysis software
for the LAT is to be validated using Monte Carlo simulations of
observations. Also useful for
validation, and for scientific analysis, would be the EGRET data imported into
the LAT analysis environment. The
mapping of the EGRET summary database files into the approximate LAT equivalent
of the event summary database would be straightforward. Translation of the timeline files into the
LAT equivalent for calculating exposures would not be quite as straightfoward,
but could be done. The complication is
that the trigger modes (and hence the effective area matrices) were changed (to
limit the number of triggers from earth albedo gamma rays) as the earth entered
and left EGRET’s field of view during every orbit.
Low-level processing (event
reconstruction and initial identification) is to be done at the LAT IOC, but
all data, Level 0 and higher, are to be provided to the SSC. In our proposal, this was planned to be done
via database mirroring. The SSC and LAT
teams agree that this is desireable and a workable implementation is being
sought. Such a system would also permit
establishment of internal LAT-team mirror sites. The database system will have to be implemented in some way to
protect proprietary data rights.
Although the LAT team will monitor the data for transient sources and to
maintain calibration, access for other purposes must be restricted during the
3-month (TBR) validation period that guest observers (and LAT team members with
winning proposals) will have for their data.