Difference between revisions of "QUonG HowTo"

Latest revision as of 13:00, 20 November 2013

Available Resources on QUonG:

For hardware:
- 16 nodes (SuperMicro X8DTG-D)
- each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket
- each node owns 2 Tesla M2075
- each node owns 1 InfiniBand card - Mellanox ConnectX (MT26428) PCIe Gen2 on a x4 PCIe slot

For sofware:
- each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch)
- GNU C/C++/Fortran compiler is version 4.4.7
- OpenMPI 1.5.4 (standard package within CentOS 6.4) - configurable with module load openmpi.x86_64
- OpenMPI 1.7 - install path is /usr/local/ompi-trunk
- MVAPICH2-1.8 - install path is /usr/local/mvapich2-1.8
- MVAPICH2-1.9a2 - install path is /usr/local/mvapich2-1.9a2 --- WARNING: 'srun' is the only launcher admitted!
- NVIDIA driver version 295.41 with CUDA 4.2 SDK (install path is /usr/local/cuda)
- SLURM batch job manager

To select OpenMPI 1.5.4 as communication middleware: module load openmpi.x86_64
To select any of the others: add to your PATH and LD_LIBRARY_PATH env variables the right directories; e.g. for OpenMPI 1.7, /usr/local/ompi-trunk/bin and /usr/local/ompi-trunk/lib.
To enable compiling CUDA code: add /usr/local/cuda/bin and /usr/local/cuda/lib64 to your PATH and LD_LIBRARY_PATH env variables.
When using OpenMPI, for reasons highlighted here, it is advided to set ulimit -l unlimited (for a Bourne style shell) or limit memorylocked unlimited (for a C style shell).
REMEMBER when using MVAPICH2 to set the MV2_USE_CUDA env variable to 1 if you are going to use the CUDA-enabled communication primitives.

Using QUonG

Available SLURM queues on QUonG

4 nodes (q000-03) in the debug queue (30 min. run time limit)
8 nodes (q004-11) in the run queue (4 hrs. run time limit)
remaining 4 nodes (q012-15) in the run queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access

To run on QUonG:

salloc

salloc allocates a number of nodes from a queue, drops you into an interactive shell where you can run from with 'mpirun' or 'srun' (see below); you exit the shell to relinquish the resources.

Example: allocate 2 nodes (-N option) from the 'debug' queue (-p option), run 'hostname' on them, then exit

[user@quong ~]$ salloc -N 2 -p debug
salloc: Granted job allocation 656482
[user@quong ~]$ mpirun hostname
q002.qng
q002.qng
q003.qng
q003.qng
[user@quong ~]$ exit
exit
salloc: Relinquishing job allocation 656482

srun

srun launches an executable or a script onto the first available nodes of the requested queue allocating them first, if not already within a 'salloc' shell.
THIS IS THE ONLY WORKING WAY WITH MVAPICH1.9a2!

Example: run 'hostname' as 4 processes (-n option) on 2 nodes from the 'run' queue

[user@quong ~]$ srun -N 2 -n 4 -p run hostname
q010.qng
q010.qng
q011.qng
q011.qng

sbatch

sbatch submits a script (and only a script!) into a queue asking for a number of nodes.

Example: submit a script that runs 'hostname' by 'mpirun' on 3 nodes from the 'run' queue in the 'apenet_development' reservation (--reservation option)

[user@quong ~]$ cat test.sh
#!/bin/bash
mpirun hostname
[fsimula@quong ~]$ sbatch -p run --reservation=apenet_development -N 3 ./test.sh
Submitted batch job 656517

If not else specified, stdout is redirected to a file 'slurm_$jobid.out'

[fsimula@quong ~]$ cat slurm-656517.out
q012.qng
q012.qng
q014.qng
q014.qng
q013.qng
q013.qng

scancel

scancel removes a submitted job id from the queue.

Example:

[user@quong ~]$ sbatch -N 8 -p run test.sh
Submitted batch job 657280
[user@quong ~]$ squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
 657271     debug     bash epastore   R      24:19      2 q[002-003]
 657278     debug  job7.sh    delia   R      18:13      1 q000
 657279     debug  job8.sh    delia   R      18:11      1 q001
 657280       run  test.sh     user  PD       0:00      8 (ReqNodeNotAvail)
 657272       run  job1.sh    delia   R      18:29      1 q004
 657273       run  job2.sh    delia   R      18:27      1 q006
 657274       run  job3.sh    delia   R      18:26      1 q007
 657275       run  job4.sh    delia   R      18:24      1 q008
 657276       run  job5.sh    delia   R      18:22      1 q009
 657277       run  job6.sh    delia   R      18:15      1 q005
[user@quong ~]$ scancel 657280
[user@quong ~]$ squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
 657271     debug     bash epastore   R      25:44      2 q[002-003]
 657278     debug  job7.sh    delia   R      19:38      1 q000
 657279     debug  job8.sh    delia   R      19:36      1 q001
 657272       run  job1.sh    delia   R      19:54      1 q004
 657273       run  job2.sh    delia   R      19:52      1 q006
 657274       run  job3.sh    delia   R      19:51      1 q007
 657275       run  job4.sh    delia   R      19:49      1 q008
 657276       run  job5.sh    delia   R      19:47      1 q009
 657277       run  job6.sh    delia   R      19:40      1 q005

sinfo

sinfo lists currently available queues and their status.

Example output:

 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up      30:00      2  alloc q[000-001]
debug*       up      30:00      2   idle q[002-003]
run          up    4:00:00      4  maint q[012-015]
run          up    4:00:00      6  alloc q[004-009]
run          up    4:00:00      2   idle q[010-011]

squeue

squeue lists currently queued jobs in the available queues.

Example output:

 JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
656479     debug  job7.sh    delia   R       4:32      1 q001
656480     debug  job8.sh    delia   R       4:24      1 q000
656473       run  job1.sh    delia   R       5:07      1 q008
656474       run  job2.sh    delia   R       5:04      1 q009
656475       run  job3.sh    delia   R       4:57      1 q004
656476       run  job4.sh    delia   R       4:44      1 q005
656477       run  job5.sh    delia   R       4:42      1 q006
656478       run  job6.sh    delia   R       4:36      1 q007

Notes

As can you see from the examples, once inside a 'salloc' shell there is no need to specify the '-n/-np' option when launching by 'mpirun'; default is to run on all cores on all allocated nodes.

@@ Line 1: / Line 1: @@
-'''Available Resources on QUonG:'''
+<div style="float:right; clear:both; margin-left:0.5em;">__TOC__</div>
+== Available Resources on QUonG: ==
 * For hardware:
 ** 16 nodes (SuperMicro X8DTG-D)
-** each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket
+** each node is dual socket, with one 4-core '''Intel Xeon E5620''' 2.40GHz per socket
-** each node owns 2 Tesla M2075
+** each node owns 2 '''Tesla M2075'''
-**
+** each node owns 1 '''InfiniBand''' card - Mellanox ConnectX (MT26428) PCIe Gen2 on a x4 PCIe slot
 * For sofware:
-** each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch)
+** each node OS is (diskless, boot from network) '''Centos 6.4''' with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch)
-** GNU C/C++ compiler is version 4.4.7
+** GNU C/C++/Fortran compiler is version '''4.4.7'''
-** OpenMPI 1.5.4 (standard package within CentOS 6.4) - configurable with '''module load openmpi.x86_64'''
+** '''OpenMPI 1.5.4''' (standard package within CentOS 6.4) - configurable with '''module load openmpi.x86_64'''
-** OpenMPI 1.7 - install path is '''/usr/local/ompi-trunk'''
+** '''OpenMPI 1.7''' - install path is '''/usr/local/ompi-trunk'''
-** MVAPICH2-1.8 - install path is '''/usr/local/mvapich2-1.8'''
+** '''MVAPICH2-1.8''' - install path is '''/usr/local/mvapich2-1.8'''
-** MVAPICH2-1.9a2 - install path is '''/usr/local/mvapich2-1.8'''
+** '''MVAPICH2-1.9a2''' - install path is '''/usr/local/mvapich2-1.9a2''' --- '''WARNING: 'srun' is the only launcher admitted!'''
-** NVIDIA driver version 295.41 with CUDA 4.2 SDK (install path is '''/usr/local/cuda''')
+** NVIDIA driver version '''295.41''' with CUDA '''4.2''' SDK (install path is '''/usr/local/cuda''')
-** SLURM batch job manager
+** '''SLURM''' batch job manager
+To select OpenMPI 1.5.4 as communication middleware: '''module load openmpi.x86_64'''<br>
+To select any of the others: add to your '''PATH''' and '''LD_LIBRARY_PATH''' env variables the right directories; e.g. for OpenMPI 1.7, '''/usr/local/ompi-trunk/bin''' and '''/usr/local/ompi-trunk/lib'''.<br>
+To enable compiling CUDA code: add '''/usr/local/cuda/bin''' and '''/usr/local/cuda/lib64''' to your '''PATH''' and '''LD_LIBRARY_PATH''' env variables.<br>
+When using OpenMPI, for reasons highlighted [http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages here], it is advided to set '''ulimit -l unlimited''' (for a Bourne style shell) or '''limit memorylocked unlimited''' (for a C style shell).<br>
+'''REMEMBER''' when using MVAPICH2 to set the '''MV2_USE_CUDA''' env variable to '''1''' if you are going to use the CUDA-enabled communication primitives.
+== Using QUonG ==
-'''Available SLURM queues on QUonG'''
+=== Available SLURM queues on QUonG ===
 * 4 nodes (q000-03) in the '''debug''' queue (30 min. run time limit)
 * 8 nodes (q004-11) in the '''run''' queue (4 hrs. run time limit)
 * remaining 4 nodes (q012-15) in the '''run''' queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access
-'''To view resources on QUonG:'''
+=== To run on QUonG: ===
-* 'sinfo': list of currently available queues and their status
-Example output:
-  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
- debug*       up      30:00      2  alloc q[000-001]
- debug*       up      30:00      2   idle q[002-003]
- run          up    4:00:00      4  maint q[012-015]
- run          up    4:00:00      6  alloc q[004-009]
- run          up    4:00:00      2   idle q[010-011]
-* 'squeue': list of currently queued jobs in the available queues
-Example output:
-  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
+==== salloc ====
-     debug  job7.sh    delia   R       4:32      1 q001
+'''salloc''' allocates a number of nodes from a queue, drops you into an interactive shell where you can run from with 'mpirun' or 'srun' (see below); you exit the shell to relinquish the resources.
-     debug  job8.sh    delia   R       4:24      1 q000
-       run  job1.sh    delia   R       5:07      1 q008
-       run  job2.sh    delia   R       5:04      1 q009
-       run  job3.sh    delia   R       4:57      1 q004
-       run  job4.sh    delia   R       4:44      1 q005
-       run  job5.sh    delia   R       4:42      1 q006
-       run  job6.sh    delia   R       4:36      1 q007
-'''To run on QUonG:'''
-* allocate a number of nodes from a queue with '''salloc''', get an interactive shell and run from there, exit the shell to relinquish the resources.
-Example:
+Example: allocate 2 nodes ('''-N''' option) from the 'debug' queue ('''-p''' option), run 'hostname' on them, then exit
-  [user@quong ~]$ salloc -N 2 -p debug <--- ask for 2 nodes ('''-N''' option) from the 'debug' queue ('''-p''' option)
+  [user@quong ~]$ salloc -N 2 -p debug
   salloc: Granted job allocation 656482
   [user@quong ~]$ mpirun hostname
@@ Line 63: / Line 50: @@
   salloc: Relinquishing job allocation 656482
-* execute a script with '''sbatch''' asking for a number of nodes from a queue.
+==== srun ====
+'''srun''' launches an executable or a script onto the first available nodes of the requested queue allocating them first, if not already within a 'salloc' shell.<br>'''THIS IS THE ONLY WORKING WAY WITH MVAPICH1.9a2!'''
+Example: run 'hostname' as 4 processes ('''-n''' option) on 2 nodes from the 'run' queue
+ [user@quong ~]$ srun -N 2 -n 4 -p run hostname
+ q010.qng
+ q010.qng
+ q011.qng
+ q011.qng
+==== sbatch ====
+'''sbatch''' submits a script (and only a script!) into a queue asking for a number of nodes.
-Example:
+Example: submit a script that runs 'hostname' by 'mpirun' on 3 nodes from the 'run' queue in the 'apenet_development' reservation ('''--reservation''' option)
   [user@quong ~]$ cat test.sh
@@ Line 73: / Line 72: @@
   Submitted batch job 656517
-If not specified, stdout is redirected to a file 'slurm_$jobid.out'
+If not else specified, stdout is redirected to a file 'slurm_$jobid.out'
   [fsimula@quong ~]$ cat slurm-656517.out
@@ Line 82: / Line 81: @@
   q013.qng
   q013.qng
+==== scancel ====
+'''scancel''' removes a submitted job id from the queue.
+Example:
+ [user@quong ~]$ sbatch -N 8 -p run test.sh
+ Submitted batch job '''657280'''
+ [user@quong ~]$ squeue
+   JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
+     debug     bash epastore   R      24:19      2 q[002-003]
+     debug  job7.sh    delia   R      18:13      1 q000
+     debug  job8.sh    delia   R      18:11      1 q001
+  '''657280       run  test.sh     user  PD       0:00      8 (ReqNodeNotAvail)'''
+       run  job1.sh    delia   R      18:29      1 q004
+       run  job2.sh    delia   R      18:27      1 q006
+       run  job3.sh    delia   R      18:26      1 q007
+       run  job4.sh    delia   R      18:24      1 q008
+       run  job5.sh    delia   R      18:22      1 q009
+       run  job6.sh    delia   R      18:15      1 q005
+ [user@quong ~]$ scancel '''657280'''
+ [user@quong ~]$ squeue
+   JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
+     debug     bash epastore   R      25:44      2 q[002-003]
+     debug  job7.sh    delia   R      19:38      1 q000
+     debug  job8.sh    delia   R      19:36      1 q001
+       run  job1.sh    delia   R      19:54      1 q004
+       run  job2.sh    delia   R      19:52      1 q006
+       run  job3.sh    delia   R      19:51      1 q007
+       run  job4.sh    delia   R      19:49      1 q008
+       run  job5.sh    delia   R      19:47      1 q009
+       run  job6.sh    delia   R      19:40      1 q005
+==== sinfo ====
+'''sinfo''' lists currently available queues and their status.
+Example output:
+  PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
+ debug*       up      30:00      2  alloc q[000-001]
+ debug*       up      30:00      2   idle q[002-003]
+ run          up    4:00:00      4  maint q[012-015]
+ run          up    4:00:00      6  alloc q[004-009]
+ run          up    4:00:00      2   idle q[010-011]
+==== squeue ====
+'''squeue''' lists currently queued jobs in the available queues.
+Example output:
+  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
+     debug  job7.sh    delia   R       4:32      1 q001
+     debug  job8.sh    delia   R       4:24      1 q000
+       run  job1.sh    delia   R       5:07      1 q008
+       run  job2.sh    delia   R       5:04      1 q009
+       run  job3.sh    delia   R       4:57      1 q004
+       run  job4.sh    delia   R       4:44      1 q005
+       run  job5.sh    delia   R       4:42      1 q006
+       run  job6.sh    delia   R       4:36      1 q007
+=== Notes ===
+As can you see from the examples, once inside a 'salloc' shell there is no need to specify the '-n/-np' option when launching by 'mpirun'; default is to run on all cores on all allocated nodes.

Difference between revisions of "QUonG HowTo"

Latest revision as of 13:00, 20 November 2013

Contents

Available Resources on QUonG:

Using QUonG

Available SLURM queues on QUonG

To run on QUonG:

salloc

srun

sbatch

scancel

sinfo

squeue

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Private web

Tools

Tools

Search