April priorities: (Richard) See them all in Confluence. Hitting some of the highlights —
OBF filter bits (Heather) Has something which compiles, but it's not complete: still need to handle a new class Bryson introduced.
OBF config on the fly (Tracy) The code for updates to the OnboardFilter has all been checked into cvs. This affects OnboardFilter, OnboardFilterTds, digiRootData, RootConvert, RootIo, AnalysisNtuple and there will also need to be a tiny fix to the GCR code as well. A major change is to move away from the FilterStatus TDS class which not only contained the result of the Gamma Filter and the "best" track information (which was then output to the ntuple in ObfValsTool), but also captured the complete snapshot of filter event reconstruction for the Tracker, Cal and ACD. This was a bit problematic, not only because it is a large amount of memory that is, generally, sparsely filled, but also it is not cleared on an event by event basis. Instead, the filter code has internal status words which indicate which pieces are valid. In the end, Root especially really hates this as there are often invalid pointers in this data.
The new scheme uses two main TDS classes: ObfFilterStatus which contains filter status information for each of the filters (including the "status byte" and any other output the filter may have, for example the filter energy from the gamma filter) and ObfFilterTrack which holds the "GRB" track information for processing with the ntuple (and which we will soon be able to display in FRED).
The scheme for recovering the details of the filter reconstruction is in the works, it is intended to be optional so that people wanting to do detailed filter studies can turn on the output of this information in special runs, but it won't be there for normal running.
[Thanks to Tracy for this report. ed.]
Science Tools report: Jim went through the Science Tools Update for April 29nd.
Mirroring data reports: (Richard) Eric G. has asked that data reports be mirrored to a non-SLAC location in case of network problems. Since LLR is already mirroring the Workbook it seems like a good candidate. (Berrie) is willing and (Anders) confirms this would not be hard to do on the SLAC end. The reports are just pdf files generated once every 8 hours.
Data handling: (Dan) Mail backlog for L1Proc is down to < 20s from > 20min, which means our statistics for running jobs are much more accurate and submission of new jobs is more responsive because it has a better count of what's actually on the farm.
A process's state transition is down in L1Proc from >1s to ~.1s which is about the same as for an MC task.
All of this has been accomplished by carefully rewriting the core scheduling and dependency-evaluation code in the pipeline server module and the stored procedures module which runs inside the Oracle server.
We spent several days testing and tuning while running a home-brewed pipeline task in TEST. Warren is now running L1Proc in TEST and we are getting ready to move the new code into DEV and then into PROD maybe in the next few days.
[thanks to Dan for writing up the full story]
Documentation: (Chuck) Focus continues to be How-to-fix. There are three sections: Infrastructure, Critical Applications (those which must be up and running for data to make its way through the Pipeline) and Non-critical Applications. Recently he's been working on Infrastructure, concentrating on xrootd, data catalog and Pipeline topics. (Richard) Has access to experts been adequate? (Chuck) Yes and no. They're very busy. As long as he does a good job of formulating specific questions, turn-around is ok.
Riccardo returns?! (maybe) (Richard) As of yesterday he has been in email conversation with Riccardo about a possible DataMind contract to convert both Fred and MRStudio to C++ and modify the latter for SCons. Riccardo estimates the work would take about 2 man-months and has provided a figure for per-month cost. Richard is now looking for money to fund this effort for 3 months (just in case). The plan would be to have DataMind write the new versions of both applications and then hand them over to Joanne for support.
Power outage fall-out The last remaining problem, now fixed, was that glast-ground, when it came back up, was behind the firewall. It's not altogether understood what happened (one of those "how did it ever work?" situations).
Skimmer (David C.) Soon after the meeting he will be making a release (v6) of the Skimmer which will accept CELs (Common Event Lists) as input. There have been many changes internally and to the interface to achieve this. It has been tested, but there is still a chance of more bugs, given the extent of the changes.
[Post meeting. Here are some details from David on the changes for v6r0:]
Due to the introduction of CEL support, the list of SK_* variables has been partly modified. Typically, any variable defining the name of a textual parameter file has been replaced by two variables, a SK_INPUT_* one and a SK_OUPUT_* one. The corresponding SK_FORCE_* and SK_SKIP_* variables have disappeared. Please read the new Confluence User Guide.
Instead of providing the list of input files through the parameter file defined with SK_INPUT_FILE_LIST, one can now provide an input ROOT CEL file, defined through SK_INPUT_CEL.
NOTE: the definition of SK_TASK was previously mandatory, because it was used in the default value of other variables. From now on, it must only be defined for pipeline I data, when one want to get the list of files from the corresponding catalog.
NOTE: from now on, to get a merge job, it should be enough to not define SK_TCUT and SK_INPUT_EVENT_LIST.
New RM, SCons (Navid) There is a problem with my new RM code and SCons. The compile command executes 99% (time wise) of the compile including all of the SCons compile section. It then gets stuck in the last 1% of code execution. In this section it is just simply updating some database tables for the RM web page. When stuck the job uses no CPU at all and as a result runs indefinitely. I have suspicion that a c++ exception is being thrown that I don't handle. Windows, in its infinite wisdom, decides it is a good idea to display a message box informing the user of the exception/crash. Because this is executed through lsf that means I can't see this message box or click OK on it.
I think I've narrowed the problem down to an sql command being used that's more than 1MB long. This probably is causing an sql exception that I don't catch. I have modified the code to split that sql command into multiple shorter commands. This means longer execution time but hopefully will get rid of the exception. I've also added more exception handling to hopefully prevent the compile processing from getting stuck. A crash is preferred over an infinitely running job.
On the unit test sides I have committed changes to flux and related packages so they no longer use $FLUX_XML to obtain the XML path for flux. All packages have now been tagged and the next SCons build should confim that most of these packages now run unit tests successfully. The remaining unit tests will probably need input from package owners because the crashes are either not descriptive or the unit tests are testing CMT specific properties.
There is no news on building ST on other platforms yet (64bit linux, or mac).
[Yet another thank you to Navid for this report. ed.]
previous | minutes index | next |