Task queue

  • Send your tasks to the queue immediately, even if there are hundreds of waiting tasks (yours or other users). Do not wait to empty the queue first – this may not happen soon if other users fill it up actively at the same time. The system for resource sharing works by rearranging the queuing tasks of all users according to the use of the cluster in the past rather than by preventing the backsliding of those who have exceeded their assigned share. Use e-mail notification when starting and / or completing the task.
  • Setting a realistic timeframe to accomplish your task with the resource parameter h_rt will allow the scheduler to better distribute your tasks. The computing cluster Physon splits task duration across three types:
  1. short ( h_rt < 4 h)
  2. medium (4 h < h_rt < 48 h )
  3. long ( 48 h < h_rt < 168 h)
  4. special – long (168 h < h_rt < 500 h) – this resource is disabled by default. For more information.contact the system administrator at hpc <AT> phys.uni-sofia.bg

Parallel programs (Open MPI)

  • When you evaluating the memory that your program will use during execution to indicate the correct value for the parameter h_vmem, you should keep in mind that Open MPI uses some extra memory to buffer the messages. Depending on the type and size of the task, this memory can reach up to 800 MB per slot.
  • When running long (over 48 hours) parallel jobs, specify parameter -q p_long.q. This avoids a bug in the batch execution system. Without this parameter, the job will remain in the waiting state qw until you delete it or modify it with the command:

qalter -q p_long.q number of the job

  • Programs that make MPI_Alltoall calls such as the distributed 2/3D FFT can increase performance through the following parameter to mpirun:
    --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_alltoallv_algorithm 2
    the directive replaces the default algorithm for the MPI_Alltoall call.