Post-mortem of 4M background event generation

The first big run at SLAC using GlastRelease v2r1 was done this past weekend, April 25-27. The goal was to run upwards of 10M backgroundavgpdr requested events.

The output is at

/nfs/farm/g/glast/u09/PerfEval2003Spring/backgndavgpdr/

with the individual runs in IndividualRuns/. Failures are kept in the IndividualRuns/failed/ directory.

 In the end, we got 4M events requested (this translated to 193k after conservative cuts - Ntrks>0 && AcdActiveDist < -20) . Here is the tale of what happened to the others:

 The list of bad runs and the ntuple filesizes are shown here. Navid wrote a perl script to identify the bad runs and move them into their own directory for further examination, and clean up the main directory.

The sizes group into 4 categories:

Observations

We need to stress test the u05 fileserver holding the releases. We'll be pushing a lot harder for DC1.

Why are jobs killed, and what is this long loop? Tracy thinks it is still the propagator.

We now are dependent on access to MySql. How do we debug failures like we saw? And are there ways to bypass it if we don't really need constants? This came in with the CAL calibration constants.


R.Dubois Last Modified: 2010-06-01 15:48:18 -0700