Draft, 13 July 2001

Design of high-level analysis software for the LAT

The design for the high-level analysis software, i.e., the software for analysis of gamma-ray data after reconstruction and background filtering of events, is driven by three principal considerations:

1. High-level analysis of GLAST data will be fundamentally statistical; limited numbers of photons and relatively poor angular resolution mean that quantitative analysis will be via model fitting.

2. The characterization of the LAT is complicated and compounded by a scanning observation mode, a large FOV, and the need to reject the intense background of cosmic rays as well as albedo gamma rays from the earth's limb.  The PSF, effective areas, and energy resolution depend on energy and direction of the gamma ray, location of the conversion in the LAT, and on the background rejection cuts employed.

3. After events are reconstructed, data access will be principally by direction on the sky and time range.  (For cosmic rays used in monitoring calibration, access will be principally by direction in instrument coordinates and time range; see Calibration section.)  The data analysis system must support efficient spatial access.

Interstellar emission model

Consideration (1) implies that a model of the direction and energy dependence of the interstellar emission of the Milky Way is needed for the high-level analysis.  This emission is from cosmic-ray collisions with interstellar gas and photons and it is present in any direction on the sky although most intense from directions close to the plane of the Galaxy.  More than 60% of the celestial gamma rays detected by EGRET were interstellar emission.  The emission can have structure on fine angular scales, and an accurate model will be useful for distinguishing low-latitude point sources from unresolved diffuse emission and for accurately determining positions and fluxes of point sources.  (In addition, an accurate model of interstellar emission at high latitudes will be essential for estimating the truly diffuse extragalactic emission.)

The development of models of interstellar emission is fairly well understood after more than two decades of application in high-energy gamma-ray astronomy.  Advances that we will take advantage of for the LAT include higher-resolution surveys of the interstellar medium than were available for EGRET analysis, and modern calculations of cosmic-ray production and propagation that include constraints from cosmic-ray isotope abundance ratios and other local measurements.

Likelihood analysis

Also regarding (1), model fitting in high-energy gamma-ray astronomy has long used the likelihood function as the measure of goodness of fit (e.g., Pollock et al. 1981).  Variations of the likelihood function (which defines the likelihood of the data given the model) with respect to the various parameters of the model can be used to quantitatively determine confidence ranges.

For EGRET and preceeding missions, the likelihood analysis was based on binned maps of photons, i.e., by comparing the predicted and observed numbers of photons in bins of energy and region of the sky. Information is lost in binning, and in principle the most sensitive analyses can be performed using unbinned implementations of the likelihood function, where the contribution to the likelihood function of each photon is treated individually, using the response functions that apply to that photon.  Unbinned analysis is much more computationally intensive, and is less well-behaved numerically, with results often being the small difference between two large numbers.

Regarding consideration (2), for GLAST, we intend to perform a trade study on the degree of binning acceptable (or maybe to implement both analysis options).  Regarding binned likelihood analysis, fine binning in energy and inclination angle are likely most important to limiting the loss of information.

Exposure calculation

The calculation of instrumental exposure is fundamental to obtaining calibrated fluxes and spectra.  The exposure is a function of time range, energy and direction on the sky.  It also depends on the spacecraft position and orientation, because directions near and below the earth's limb must be excluded.  Exposure calculations, complicated as they are, must be performed rapidly in order to support the multiple all-sky analyses that will be undertaken daily.  Our implementation of the LAT analysis software includes an optimized algorithm that can quickly and accurately generate exposure matrices by factoring the problem.  Much of the calculation is in accumulating livetimes, and is independent of instrumental response functions.  These accumulations can be made quickly on a predefined (sufficiently fine) grid on the sky.

Technical assumptions

The volume of Level 1 data will be too great (1–2 Tbyte/yr), and searching the data too computationally intensive, for the entire dataset to be distributed to each LAT investigator or guest investigator.  The Level 1 and associated databases for high-level analysis (see below) will be accessed via server computers at a few sites.  These sites are envisioned to be the LAT IOC, remote analysis sites of coinvestigator institutions, and the GLAST SSC.  High-level analysis modules will be run on client computers, not necessarily colocated with the servers, that query the servers for data.  This division obviates the need to distribute the whole LAT data set as part of the analysis environment, spreads the overall computational load for analysis, and enables a single analysis environment to be supported across the collaboration and within the SSC.  (The LAT team is required to produce an analysis environment that can be used by outside investigators supported by the SSC.)

Data Flow

The flow of data from Level 0 through the highest levels of processing is diagrammed in Figures 1 and 2.  The databases and processing steps for Level 1, i.e., the Event database and higher processing are described in the subsections below.

 

 

Figure 1 - Level 0 to Level 1 processing

 

 

Figure 2 - Post Level 1 processing

The analysis interface layer outlined in Figure 2 extracts data, calibration, and emission model information from the databases and passes it to the higher-level analysis modules.  The passing is done via FITS files (TBR).

Estimated file sizes for a typical analysis

As an example, consider analyzing a year’s worth of data for a point source.  Upon receipt of a request for the data for a region of the sky around the source (for a set of background/PSF cuts, energy range and zenith angle cuts), the Event Extractor would retrieve high-level information for the photons.  The high-level information passed back to the client would have the energy, direction, inclination, azimuth, plane of conversion, quality flags, time for each photon, about 40 bytes minimum for each photon, and approximately 1 million photons (for a 10° radius selection region).  The corresponding exposure matrix produced by the Exposure Generator would have exposure tabulated for a grid of energy, direction, inclination, azimuth, plane of conversion.  This could be fairly large, approx 1000 (ra,dec) x 10 (inclination) x 10 (azimuth) x 10 (energy) = 1 million entries.  So 50 Mbyte or so would have to pass from the server to the client before analysis began.  The appropriate instrument response functions for the time range and event classes selected would be generated by the IRF server.  (TBR.  The IRF server would have response functions for a predefined set of background rejection/PSF enhancement cuts; new cuts would require new response functions to be generated from the calibration Monte Carlo events.)  The interstellar emission model for the corresponding region of the sky would probably have to be requested from the Emission Model Server as well (specifying, e.g., the coordinate system and binning), but this would be much smaller.  In addition, the point-source catalog should probably also be queried to assist in defining the overall (background + sources) emission model for the region surrounding the source under study.

Analysis Environment

The high-level likelihood analysis of LAT data will have interactive (graphical user interface) and batch (command line or script driven) modes.  Much of the LAT team's routine analysis of the gamma-ray data will not be interactive.  For example, all-sky searches for point sources (to flag sources that are flaring) will be made for short time scales (typically hours), and so will be run many times per day.

Infrastructure

The infrastructure of the high-level science analysis system includes the Analysis Interface Layer described above (see Fig. 2) and the software and databases needed to provide the services of the Analysis Interface Layer.  In particular, Exposure calculation, Event Summary generation, High-level calibration database, and the Diffuse Emission Model are part of the infrastructure.  These modules and services are the core of the high-level analysis system.

Not explicitly discussed elsewhere, but essential to the high-level analysis system is a tool for Map generation and for displaying images and plots.  Maps can be generated, e.g., from a list of photons or from an exposure matrix.  Images can be displayed with full coordinate information, with reprojections if necessary, and overlays.

Data Export

All processing steps that produce image or tabular output will have the capability to write the output in FITS format (other formats TBD) to facilitate subsequent display or processing outside of the LAT SAS system.

Software Reuse

 [What can be reused from Chandra?  Sherpa for model fitting, CALDB, ChIPs for plotting and image display?  DS9?  How about HEASARC Xanadu?  Perhaps should state that will be able to provide input to Xanadu in the format that it expects,  Gaudi - a good idea for implementing the SAS?  Even for relatively standalone tools?  Root environment?  What can be reused from EGRET?  Little directly; e.g., algorithms for pulsar barycenter corrections.  Skyview?]

 

Database

Contents

Access Criteria

Used by

Event

full info. for each event, including reconstruction (Level 1 database)

time or event number

Event Summ. constructor, event display, low-level calib monitoring

Event summary

energy, direction (celestial and instr. coords), time, plane/tower/log layer of conversion, event id and bkgnd rej/quality flags

energy, direction, time range, event flags, event ID

high-level map generation and analysis, CR event selection

High-level calib

instrument response functions as functions of energy, angles, plane, time,...

energy, angles, time, ...

Exposure gen, high-level analysis

Exposure history (timeline)

S/C position, orientation, LAT mode, and livetime for regular ~30s time intervals

time range

Exposure gen.

Source sim.

Monte Carlo equivalent of Level 0 data, perhaps already as ‘digis’, with truth info, and run/config. ID

?

Recon

Pt. Src. Detection

Position, flux, spectral hardness and associated uncertainties, time range

coordinates, time range

Transient Src search, Pt. Src. Catalog Gen.

Pt. Src. Catalog

Summary of Pt. Src. Detection, flux histories and candidate source IDs

coordinates, spectral hardness, variability index,....

Catalog access interface?

Pulsar Ephem

(radio) Timing parameters for known pulsars, contemp. with GLAST mission

pulsar name

Barycenter corrector

GRB

?

?

?

Table 1 - High-level databases.

 

High-level analysis tasks

The high-level analysis tasks planned for development are described in Table 2.  Most of them derive their inputs from the Analysis Interface Layer, i.e., all of the inputs that they require are in the Level 1 and associated data (see Fig. 2).  Other tasks require Level 2 data, i.e., the output of another high-level task.  Some of the tasks are related to ancillary science goals for the LAT and will be developed as level of effort undertakings.

Name

Function

Inputs

Outputs

Point-source detection

Analyzing a given region of the sky for point sources

Analysis interface layer

locations, fluxes, significances, spectrum or spectral hardness);

Point-source spectroscopy

model fitting with flexible definition of spectral models; possibly developed as part of the general likelihood analysis capability described below (Extended sources and confused regions)

Analysis interface layer

Model coeffs and uncertainties

Source variability

Flare detection (short term, for issuing alerts), pt. source vs. extended source determination (longer term, for quantifying variability)

Point source detection database

Flux histories, estimates of variability

Extended sources and confused regions

‘custom’ model fitting. Interactive analysis largely will be model fitting (parametric), allowing flexible specification of source – multiple point sources, spectral models, arbitrary extended sources

Analysis interface layer

Model parameters, confidence ranges

GRB time profiles

Constructs time profiles for user-defined event selection criteria

Analysis interface layer (Event Summary)

Time profile histograms (perhaps normalized by IRFs, with periods outside FOV indicated), tables of events associated with a burst

Source identification

Quantitatively defining probabilities of associations of LAT pt. srcs. with srcs. in other astronomical catalogs

Point source catalog

Point source catalog

Pulsar phase calculation

Assign pulsar phases to a set of photons based on the timing params for the pulsar, to allow phase-resolved analysis for most of the analysis tasks, like spectral meas., and phase binning - for histograms and maps.

Analysis interface, Pulsar Ephemerides

Phase assignments by event number (?)

Pulsar periodicity searches

Searches for pulsations in data for a point source

Analysis interface*

Ideally, position, period, period derivative,...

High-resolution spectroscopy

for narrow-line emission at high energies

Analysis interface

Line energy, flux, or upper limits

Inflight calibration

monitoring effective area via fluxes of pulsars, monitoring PSF via phase-selected photon distributions around pulsars.**

Analysis interface

Flux histories, PSF profile plots, tables

* Also may need a tool to display times when target was in FOV to select intervals with greatest continuous coverage.

** Gains, alignments, hot/dead strips, etc., are part of the lower-level calibration monitoring described in the Calibration section)

Table 2 - High-level science analysis tasks.

 

Other potential analysis tasks (potentially level of effort):

· Multiple-gamma events - this may be a lower-level analysis issue - after reconstruction need to define a flag or a set of flags that indicate multiple pairs of tracks may be present.  What would be most interesting is multiple pairs of tracks with the same apparent arrival direction.  [What would be the approximate rate of multiple gamma events of any kind - just from closely-spaced arrival times of otherwise unrelated photons?  2.5 Hz avg rate, 20 µs separation?]

· Nonparametric algorithms for detection of point sources and extended sources without models (either for point sources or interstellar emission).  This includes wavelet analysis - application for quick detection of transients.

· Polarization of point sources - the measurement will be hard (possible?), need to measure the plane of the e+/e- pair

Interstellar emission model

The interstellar emission model will be refined, most likely iteratively, based on LAT observations during the sky survey.  The models for cosmic-ray production and propagation in particular are most constrained by the gamma-ray observations themselves.  Some aspects of the EGRET findings, in particular the ‘GeV excess,’ need to be verified and investigated in more detail with LAT data.  Also, in special directions, the 3-dimensional distribution of interstellar gas is especially difficult to determine from spectral line surveys of H I and CO, and models for different distributions consistent with the radio/mm observations may have to be tested against LAT data.

No particular tool has been identified for validating and refining the model.  The most useful input would likely be a point-source subtracted map of the sky.

For LAT data analysis, the model will be precomputed for a grid of directions and energies on a grid finer than the angular and energy resolution.  There’s no particular advantage to generating the model on the fly for arbitrary directions and energies.  The nature of the calculation (line of sight integration of the products of cosmic-ray and interstellar gas or photon densities) makes precomputing the maps straightforward and efficient.

Observation simulators

Two are needed:  low-level (generates events that are passed through Recon and Bkgnd Rej) and high-level (based on instrument response functions and the exposure calculator).  The former will be important for developing and testing the SAS system (mock data challenges) and the latter will be a proposal preparation and observation planning tool.

Other considerations

The high-level analysis software for the LAT is to be validated using Monte Carlo simulations of observations.  Also useful for validation, and for scientific analysis, would be the EGRET data imported into the LAT analysis environment.  The mapping of the EGRET summary database files into the approximate LAT equivalent of the event summary database would be straightforward.  Translation of the timeline files into the LAT equivalent for calculating exposures would not be quite as straightfoward, but could be done.  The complication is that the trigger modes (and hence the effective area matrices) were changed (to limit the number of triggers from earth albedo gamma rays) as the earth entered and left EGRET’s field of view during every orbit.

Low-level processing (event reconstruction and initial identification) is to be done at the LAT IOC, but all data, Level 0 and higher, are to be provided to the SSC.  In our proposal, this was planned to be done via database mirroring.  The SSC and LAT teams agree that this is desireable and a workable implementation is being sought.  Such a system would also permit establishment of internal LAT-team mirror sites.  The database system will have to be implemented in some way to protect proprietary data rights.  Although the LAT team will monitor the data for transient sources and to maintain calibration, access for other purposes must be restricted during the 3-month (TBR) validation period that guest observers (and LAT team members with winning proposals) will have for their data.