Batch Processing

 If you have time-consuming tasks, many simple tasks or a lot of files to process, you can use batch system on the NICA cluster (or other clusters, such as HybriLIT with SLURM or MPD-Tier1 with Torque) to essentially accelerate your work. There are 252 logical processor cores allocated on the NICA cluster for distributed task processing with Sun Grid Engine (SGE) scheduling system on the batch cluster part.

Figure 1. The current structure of the NICA cluster

 If you know how to work with Sun Grid Engine system (user guide), you can use qsub command on the NICA cluster to parallel data processing, a simple example of user job for SGE (as well as examples for Torque and SLURM) can be found in ‘macro/mpd_scheduler/examples/test’ directory in our software. Otherwise, MPD-Scheduler was developed to simplify running of user tasks in parallel.

MPD-Scheduler is a module of MpdRoot and BmnRoot software. It uses existing batch system (SGE, SLURM and Torque are supported) to distribute user jobs on the cluster and simplifies parallel job executing without knowledge of the batch systems, such as SGE and qsub command. Jobs for distributed execution are described and passed to MPD-Scheduler as XML file:
$ mpd-scheduler my_job.xml

Example of MPD-Scheduler job:

<job name="reco_job">
   <macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>
   <file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>
   <file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>
   <file sim_input="energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>
   <run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>
</job>

 XML description of a job to run in batch system starts with <job> tag and ends with closing </job> tag. The attribute ‘name’ defines name of the job to identify it.

Tag <macro> sets information about a macro being executed by MpdRoot or BmnRoot:

  • name – path of the ROOT macro for distributed execution.
    Important! Since MPD job can be started by scheduler on any machine of the cluster, path must point to the shared space (e.g. /nica/mpd volumes on the NICA cluster). This argument is required.
  • start_event – number of the start event to process for all input files (as macro’s argument after input and output files’ path if present). This argument is optional.
  • count_event – count of events to process for all input files (as macro’s argument after start_event argument if present). This argument is optional.
  • add_args – additional last arguments of the ROOT macro, if required.

Tag <file> contains information about input and output files to process by the macro above:

  • input – path to one input file or a set of files in case of regular expression (?,*,+) to process.
  • file_input – path to text file containing the list of input files separated by new line.
  • job_input – name of one of the previous jobs that resulting files are input files for the current job.
  • sim_input – string to specify a list of input simulation files forming from the Unified database.
  • exp_input – string to specify a list of input experimental (raw) files forming from the Unified database.
  • output – path to result files.
    Important! Since user job can be started by MPD-Scheduler on any machine of the cluster, path must point to the shared space, e.g. Gluster volume.
  • start_event – number of the start event specific for this file. This argument is optional.
  • count_event – count of events to process specific for this file. This argument is optional.
    Important! If start_event (or count_event) is set in <macro> and <file> tags then the value in the <file> tag is selected.
  • parallel_mode – processor count to parallel event processing of this input file. This argument is optional.
  • merge – whether merge result part files in parallel_mode. Default value is true, possible values of the attribute: “false”, “true”, “nodel”, “chain”.

The example of sim_input string is “energy=3,gen=urqmd”, where the following parameters: collision energy of select events is equal 3 GeV, event generator is UrQMD. All possible parameters are described in database utility (console) section. As the list of input files to process can include many files in one <file> tag, special variables can be used in the output argument to set the list of output files:

${counter} = counter is corresponding regular input file to process, the counter starts with 1 and increases by 1 for next input file.
${input} = absolute path of regular input file.
${file_name} = regular input file name without extension.
${file_name_with_ext} = regular input file name with extension.

Also, you can use additional possibility to exclude first (‘~’ symbol after colon) or last (‘-‘ symbol after colon) characters for the special variables above, e.g. ${file_name_with_ext:~N} is used as input file name with extension but without first N chars, or ${file_name:-N} – input file name without last N chars.

Tag <run> describes run parameters and allocated resources for the job. It can contain the following arguments:

  • mode – execution mode. Value ‘global’ sets distributed processing on cluster with batch system, ‘local’ – multi-threaded execution on the user multi-core computer. The default value is ‘local’.
  • count – maximum count of the processors allocated for this job. If this count is greater than the number of files to process, the count of the allocated processors is assigned to the number of files. The default value is 1. You can assign the processor count to 0, all cores will be used in case of the local mode or all processors of the batch system will be requested in case of the global mode.
  • config – path of a bash file with environment variables (including ROOT environment variables) being executed before macro. This argument is optional.
  • logs – log file path for multi-threaded (local) mode.
  • priority – priority of the job (an integer in the range -1023 to 1024 with default value: 0).
  • queue – queue name of the batch system used to run the job if it’s not default, e.g. “mpd@bfsrv.jinr-t1.ru” for LIT T1 cluster queue for MPD.

 If you want to run non-ROOT (arbitrary) command by MPD-Scheduler, use <command> tag instead of <macro> with argument line – command line for distributed execution by the scheduling system. Example of the MPD job with <command> tag:

<job>
   <command line="get_mpd_prod energy=5-9"/>
   <run mode="global" config="~/mpdroot/build/config.sh"/>
</job>

 
Another example of the job for local multithreaded execution:

<job>
   <macro name=“~/mpdroot/macro/mpd/reco.C"/>
   <file input=“~/mpdroot/macro/mpd/evetest1.root" output="~/mpdroot/macro/mpd/mpddst1.root“ start_event=”0” count_event=”0”/>
   <file input="~/mpdroot/macro/mpd/evetest2.root" output="~/mpdroot/macro/mpd/mpddst2.root“ start_event=”0” count_event=”1000” parallel_mode=“5” merge=“true”/>
   <run mode="local" count=“6" config=“~/mpdroot/build/config.sh" logs="processing.log"/>
</job>

 XML-file for MPD-Scheduler can contain more than one job, and a list of user jobs for distributed execution. In this case <job> tags are included in general <jobs> tag. Some dependencies can be set between different jobs, so that job depending on another job will not start its execution until the latter end. To set dependency, use ‘dependency’ attribute assigned the name of another job in the <job> tag.
 
 The directory “macro/mpd_scheduler/examples” of the software contains different XML-examples for the developed scheduling system.

Attention! MPD-Scheduler uses Sun Grid Engine batch system by default. To use another batch system, you should change the corresponding (second) line of ‘macro/mpd_scheduler/src/batch_select.h’ file and re-compile the software.

 MPD-Scheduler is also described at MS PowerPoint presentation. If you have any questions about MPD-Scheduler, please, email: gertsen@jinr.ru.