Bulk Processing prior to OPUS

Bulk MC is being produced on the SLAC linux farm for PSF and background rejection analysis going into DC1.

Running Jobs

Bulk submission of jobs is performed using the RunGleam.pl perl script, available from cvs. As of this writing, we are targeting the /nfs/farm/g/glast/u09/, u10 partitions for this work.

The directories are laid out as follows:

Usage:

 ./RunGleam.pl GleamParam.txt

 You will see echoed the one-line response from the LSF batch daemon as each individual job is submitted.

 Some tips for using the SLAC batch farm are available from the Gleam cookbook.

You must supply a basic jobOptions file for the task to run. Generally these are set up to output MC, Digi, Recon plus the AnalysisNtuple and Merit ntuples.

Here is an annotated setup file (found in the config/ directory):

# Run parameters

# Put a list of sources you want to run here
# preceeded by the word source: fields
# are: sourcename (# of events) (# of runs)

# e.g:

source backgndavgpdr 10000 250

# If you run a couple of tests and want to continue from there
# this variable (startrun) allows you to pick a starting run number.
# This can be useful, e.g. if you run a couple of tests and want
# to use those results, i.e. not have to regenerate them. It could
# also be usefule if you want to run things incrementally, looking
# at intermediate results along the way.

startrun 3001

# if you want a tuple file, uncomment the word tuple here

tuple merit

# Put the word batch here if you want sources to run in parallel
# on the batch system (delete it or comment it out if you just want to run
# on the local system); append a que name if you want a particular que
# e.g. batch short - note all jobs will be submitted to that que...
# the default que for sending jobs is extralong - you can run bqueues
# from the command line from any machine that you can submit batch
# jobs on to get a list of legal batch que names - if you give a bad
# one RunMC.pl lets it through and has the batch system scold you

batch xlong

# put the location of gleamApp here


gleamApp /nfs/farm/g/glast/u10/builds/GlastRelease/GlastRelease-v3r2/Gleam/v5r2/rh72_gcc2953/Gleam.exe

# Set cmtpath
cmtpath :/nfs/farm/g/glast/u10/builds/GlastRelease/GlastRelease-v3r2/:

# Specify the location of the base gleamApp options file here:
# ( note that the DataManager will only use this as its template and
# will generate it's own job options file for each run; make sure
# that all basic information other than source, number of events,
# and input and output file names are correctly logged in the
# generic file you specify here)
#
# pdrOptions - options for doing event generation plus reconstruction


gleamOptions backgndavg.txt


# put the location you want output files to go here:
# (include trailing slash on directory name)

putfiles /nfs/farm/g/glast/u09/PerfEval2003Spring/backgndavgpdr-v3r2-50M/IndividualRuns/
 

 

Cleaning Up

Sadly, jobs do fail. Assuming the failures are irreparable (ie not fileserver outages etc), we run batchsort.pl to move the failed jobs into their own directory, called failed/. This moves out all output files from any given failed run (determined by lack of finding the successful completion string at the end of the batch job). After that, one can proceed with pruning.

It is also useful to be able to prune incrementally as a task is being run. The prunePrep.pl script is used to:

 One can then run the ntuple pruner from the symlinks or the tree pruner from the lists of files.

Pruning Root Trees

The output Root files are a bit large to deal with, so we apply conservative cuts and concatenate the surviving events into a small number of files. This is done using PruneTrees, available in cvs. See the code for the applied cuts, but at the time of writing, we apply two cuts:

Ntrk > 0 && activeDistance < -20.

We have had problems with Root's TChain class not sorting files reliably, so we provide lists of MC, Digi and Recon files to process, specifying the order ourselves (single column giving the full filename).

The pruneTrees executable is installed in:
$GLASTROOT/ground/farmTools/RootAnalysis/v4r2p0/rh72_gcc2953

Usage:

cd $GLASTROOT/ground/farmTools
glastpack.pl login
glastpack.pl run RootAnalysis pruneTrees.exe mcfiles reconfiles digifiles
OR
source $GLASTROOT/ground/farmTools/RootAnalysis/v4r2p0/cmt/setup.sh
pruneTrees mcfiles reconfiles digifiles

Pruning Root Ntuples

This is done using a Root macro, pruneTuple. See the code for the applied cuts, but at the time of writing, we apply two cuts:

Ntrk > 0 && activeDistance < -20.

Usage (from the task top directory):

root> .L PruneTuple.cxx
root> PruneTuple("IndividualRuns/mer*.root");

There is an optional second parameter with a default value of 200000.  This parameter means that ntuples will be limited to 200k events per ntuple.  Therefore, multiple ntuples may be output from this macro.

This will produce a file called ntuple-prune.root, which you will want to rename.  ntuple-prune.root contains all the events
In addition, if there are more than 200k meeting the criteria of the cuts - then a series of ntuples will be output named:
ntuple-prune0.root, ntuple-prune1.root etc...  where each ntuple contains no more than 200k events.

Peeling Events

After studying a set of events, users may have a set of events they desire to study more closely.  This set may be substantially smaller than the full set of events available in the full ROOT files.  In this case, we can extract a specific set of events from a set of ROOT files.  First a user must provide an ASCII file of run and event ids:
runNum eventId
runNum2 eventId2
etc..

peeler.pl
To reduce the processing time - we first determine what files contains the desired run/event pairs.  This is done using a new script called peeler.pl, located in $GLASTROOT/ground/scripts.  This script is called as follows:
peeler.pl [input file] (directory)
where the input file is an ASCII file containing the run/event pairs
the directory is the directory containing the MC, Recon, and Digi ROOT files
The output from this script is a set of 3 ASCII files containing the full path and name of the ROOT files containing these run/event pairs (as determined from the names of the ROOT files)

peelTrees.exe
Now we have 3 ASCII files containing lists of the ROOT files containing the run/event pairs we want to extract.  Now we run another ROOT executable called peelTrees.exe.  This program is installed at:
$GLASTROOT/ground/farmTools/RootAnalysis/v4r2p0/rh72_gcc2953

It is called via:
peelTrees.exe mcfiles reconfiles digifiles peel.txt
Where: 
mcfiles is an ASCII file of MC ROOT files names
reconfiles is an ASCII file of Recon ROOT file names
digifiles is an ASCII file of Digi ROOT file names
peel.txt is an ASCII file of run/event pairs (separated by a space and each pair on its own line)

ReProcessing Root Trees

We have been in the habit of rerunning AnalysisNtuple as the analysis code is iterated. This can be done from the Root trees and then rerunning AnalysisNtuple, as per this jobOptions file, where one supplies the output ntuple file, and input chains of Root trees. This example only produces the AnalysisNtuple (note to self: add example of merit ntuple!).

 


R.Dubois Last Modified: 2004-08-04 15:42:08 -0700