Difference between revisions of "QUonG HowTo"
(27 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | <div style="float:right; clear:both; margin-left:0.5em;">__TOC__</div> | |
+ | |||
+ | == Available Resources on QUonG: == | ||
+ | |||
* For hardware: | * For hardware: | ||
** 16 nodes (SuperMicro X8DTG-D) | ** 16 nodes (SuperMicro X8DTG-D) | ||
− | ** each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket | + | ** each node is dual socket, with one 4-core '''Intel Xeon E5620''' 2.40GHz per socket |
− | ** each node owns 2 Tesla M2075 | + | ** each node owns 2 '''Tesla M2075''' |
− | ** | + | ** each node owns 1 '''InfiniBand''' card - Mellanox ConnectX (MT26428) PCIe Gen2 on a x4 PCIe slot |
* For sofware: | * For sofware: | ||
− | ** each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch) | + | ** each node OS is (diskless, boot from network) '''Centos 6.4''' with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch) |
− | ** GNU C/C++ compiler is version 4.4.7 | + | ** GNU C/C++/Fortran compiler is version '''4.4.7''' |
− | ** OpenMPI 1.5.4 (standard package within CentOS 6.4) - configurable with '''module load openmpi.x86_64''' | + | ** '''OpenMPI 1.5.4''' (standard package within CentOS 6.4) - configurable with '''module load openmpi.x86_64''' |
− | ** OpenMPI 1.7 - install path is '''/usr/local/ompi-trunk''' | + | ** '''OpenMPI 1.7''' - install path is '''/usr/local/ompi-trunk''' |
− | ** MVAPICH2-1.8 - install path is '''/usr/local/mvapich2-1.8''' | + | ** '''MVAPICH2-1.8''' - install path is '''/usr/local/mvapich2-1.8''' |
− | ** MVAPICH2-1.9a2 - install path is '''/usr/local/mvapich2-1. | + | ** '''MVAPICH2-1.9a2''' - install path is '''/usr/local/mvapich2-1.9a2''' --- '''WARNING: 'srun' is the only launcher admitted!''' |
− | ** NVIDIA driver version 295.41 with CUDA 4.2 SDK (install path is '''/usr/local/cuda''') | + | ** NVIDIA driver version '''295.41''' with CUDA '''4.2''' SDK (install path is '''/usr/local/cuda''') |
− | ** SLURM batch job manager | + | ** '''SLURM''' batch job manager |
+ | |||
+ | To select OpenMPI 1.5.4 as communication middleware: '''module load openmpi.x86_64'''<br> | ||
+ | To select any of the others: add to your '''PATH''' and '''LD_LIBRARY_PATH''' env variables the right directories; e.g. for OpenMPI 1.7, '''/usr/local/ompi-trunk/bin''' and '''/usr/local/ompi-trunk/lib'''.<br> | ||
+ | To enable compiling CUDA code: add '''/usr/local/cuda/bin''' and '''/usr/local/cuda/lib64''' to your '''PATH''' and '''LD_LIBRARY_PATH''' env variables.<br> | ||
+ | When using OpenMPI, for reasons highlighted [http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages here], it is advided to set '''ulimit -l unlimited''' (for a Bourne style shell) or '''limit memorylocked unlimited''' (for a C style shell).<br> | ||
+ | '''REMEMBER''' when using MVAPICH2 to set the '''MV2_USE_CUDA''' env variable to '''1''' if you are going to use the CUDA-enabled communication primitives. | ||
+ | |||
+ | == Using QUonG == | ||
− | + | === Available SLURM queues on QUonG === | |
* 4 nodes (q000-03) in the '''debug''' queue (30 min. run time limit) | * 4 nodes (q000-03) in the '''debug''' queue (30 min. run time limit) | ||
* 8 nodes (q004-11) in the '''run''' queue (4 hrs. run time limit) | * 8 nodes (q004-11) in the '''run''' queue (4 hrs. run time limit) | ||
* remaining 4 nodes (q012-15) in the '''run''' queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access | * remaining 4 nodes (q012-15) in the '''run''' queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access | ||
− | + | === To run on QUonG: === | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | ==== salloc ==== | |
− | + | '''salloc''' allocates a number of nodes from a queue, drops you into an interactive shell where you can run from with 'mpirun' or 'srun' (see below); you exit the shell to relinquish the resources. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | ''' | ||
− | |||
− | Example: | + | Example: allocate 2 nodes ('''-N''' option) from the 'debug' queue ('''-p''' option), run 'hostname' on them, then exit |
− | [user@quong ~]$ salloc -N 2 -p debug | + | [user@quong ~]$ salloc -N 2 -p debug |
salloc: Granted job allocation 656482 | salloc: Granted job allocation 656482 | ||
[user@quong ~]$ mpirun hostname | [user@quong ~]$ mpirun hostname | ||
Line 63: | Line 50: | ||
salloc: Relinquishing job allocation 656482 | salloc: Relinquishing job allocation 656482 | ||
− | + | ==== srun ==== | |
+ | '''srun''' launches an executable or a script onto the first available nodes of the requested queue allocating them first, if not already within a 'salloc' shell.<br>'''THIS IS THE ONLY WORKING WAY WITH MVAPICH1.9a2!''' | ||
+ | |||
+ | Example: run 'hostname' as 4 processes ('''-n''' option) on 2 nodes from the 'run' queue | ||
+ | |||
+ | [user@quong ~]$ srun -N 2 -n 4 -p run hostname | ||
+ | q010.qng | ||
+ | q010.qng | ||
+ | q011.qng | ||
+ | q011.qng | ||
+ | |||
+ | ==== sbatch ==== | ||
+ | '''sbatch''' submits a script (and only a script!) into a queue asking for a number of nodes. | ||
− | Example: | + | Example: submit a script that runs 'hostname' by 'mpirun' on 3 nodes from the 'run' queue in the 'apenet_development' reservation ('''--reservation''' option) |
[user@quong ~]$ cat test.sh | [user@quong ~]$ cat test.sh | ||
Line 73: | Line 72: | ||
Submitted batch job 656517 | Submitted batch job 656517 | ||
− | If not specified, stdout is redirected to a file 'slurm_$jobid.out' | + | If not else specified, stdout is redirected to a file 'slurm_$jobid.out' |
[fsimula@quong ~]$ cat slurm-656517.out | [fsimula@quong ~]$ cat slurm-656517.out | ||
Line 82: | Line 81: | ||
q013.qng | q013.qng | ||
q013.qng | q013.qng | ||
+ | |||
+ | ==== scancel ==== | ||
+ | '''scancel''' removes a submitted job id from the queue. | ||
+ | |||
+ | Example: | ||
+ | |||
+ | [user@quong ~]$ sbatch -N 8 -p run test.sh | ||
+ | Submitted batch job '''657280''' | ||
+ | [user@quong ~]$ squeue | ||
+ | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
+ | 657271 debug bash epastore R 24:19 2 q[002-003] | ||
+ | 657278 debug job7.sh delia R 18:13 1 q000 | ||
+ | 657279 debug job8.sh delia R 18:11 1 q001 | ||
+ | '''657280 run test.sh user PD 0:00 8 (ReqNodeNotAvail)''' | ||
+ | 657272 run job1.sh delia R 18:29 1 q004 | ||
+ | 657273 run job2.sh delia R 18:27 1 q006 | ||
+ | 657274 run job3.sh delia R 18:26 1 q007 | ||
+ | 657275 run job4.sh delia R 18:24 1 q008 | ||
+ | 657276 run job5.sh delia R 18:22 1 q009 | ||
+ | 657277 run job6.sh delia R 18:15 1 q005 | ||
+ | [user@quong ~]$ scancel '''657280''' | ||
+ | [user@quong ~]$ squeue | ||
+ | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
+ | 657271 debug bash epastore R 25:44 2 q[002-003] | ||
+ | 657278 debug job7.sh delia R 19:38 1 q000 | ||
+ | 657279 debug job8.sh delia R 19:36 1 q001 | ||
+ | 657272 run job1.sh delia R 19:54 1 q004 | ||
+ | 657273 run job2.sh delia R 19:52 1 q006 | ||
+ | 657274 run job3.sh delia R 19:51 1 q007 | ||
+ | 657275 run job4.sh delia R 19:49 1 q008 | ||
+ | 657276 run job5.sh delia R 19:47 1 q009 | ||
+ | 657277 run job6.sh delia R 19:40 1 q005 | ||
+ | |||
+ | ==== sinfo ==== | ||
+ | '''sinfo''' lists currently available queues and their status. | ||
+ | |||
+ | Example output: | ||
+ | |||
+ | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
+ | debug* up 30:00 2 alloc q[000-001] | ||
+ | debug* up 30:00 2 idle q[002-003] | ||
+ | run up 4:00:00 4 maint q[012-015] | ||
+ | run up 4:00:00 6 alloc q[004-009] | ||
+ | run up 4:00:00 2 idle q[010-011] | ||
+ | |||
+ | ==== squeue ==== | ||
+ | '''squeue''' lists currently queued jobs in the available queues. | ||
+ | |||
+ | Example output: | ||
+ | |||
+ | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
+ | 656479 debug job7.sh delia R 4:32 1 q001 | ||
+ | 656480 debug job8.sh delia R 4:24 1 q000 | ||
+ | 656473 run job1.sh delia R 5:07 1 q008 | ||
+ | 656474 run job2.sh delia R 5:04 1 q009 | ||
+ | 656475 run job3.sh delia R 4:57 1 q004 | ||
+ | 656476 run job4.sh delia R 4:44 1 q005 | ||
+ | 656477 run job5.sh delia R 4:42 1 q006 | ||
+ | 656478 run job6.sh delia R 4:36 1 q007 | ||
+ | |||
+ | === Notes === | ||
+ | |||
+ | As can you see from the examples, once inside a 'salloc' shell there is no need to specify the '-n/-np' option when launching by 'mpirun'; default is to run on all cores on all allocated nodes. |
Latest revision as of 13:00, 20 November 2013
Available Resources on QUonG:
- For hardware:
- 16 nodes (SuperMicro X8DTG-D)
- each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket
- each node owns 2 Tesla M2075
- each node owns 1 InfiniBand card - Mellanox ConnectX (MT26428) PCIe Gen2 on a x4 PCIe slot
- For sofware:
- each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch)
- GNU C/C++/Fortran compiler is version 4.4.7
- OpenMPI 1.5.4 (standard package within CentOS 6.4) - configurable with module load openmpi.x86_64
- OpenMPI 1.7 - install path is /usr/local/ompi-trunk
- MVAPICH2-1.8 - install path is /usr/local/mvapich2-1.8
- MVAPICH2-1.9a2 - install path is /usr/local/mvapich2-1.9a2 --- WARNING: 'srun' is the only launcher admitted!
- NVIDIA driver version 295.41 with CUDA 4.2 SDK (install path is /usr/local/cuda)
- SLURM batch job manager
To select OpenMPI 1.5.4 as communication middleware: module load openmpi.x86_64
To select any of the others: add to your PATH and LD_LIBRARY_PATH env variables the right directories; e.g. for OpenMPI 1.7, /usr/local/ompi-trunk/bin and /usr/local/ompi-trunk/lib.
To enable compiling CUDA code: add /usr/local/cuda/bin and /usr/local/cuda/lib64 to your PATH and LD_LIBRARY_PATH env variables.
When using OpenMPI, for reasons highlighted here, it is advided to set ulimit -l unlimited (for a Bourne style shell) or limit memorylocked unlimited (for a C style shell).
REMEMBER when using MVAPICH2 to set the MV2_USE_CUDA env variable to 1 if you are going to use the CUDA-enabled communication primitives.
Using QUonG
Available SLURM queues on QUonG
- 4 nodes (q000-03) in the debug queue (30 min. run time limit)
- 8 nodes (q004-11) in the run queue (4 hrs. run time limit)
- remaining 4 nodes (q012-15) in the run queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access
To run on QUonG:
salloc
salloc allocates a number of nodes from a queue, drops you into an interactive shell where you can run from with 'mpirun' or 'srun' (see below); you exit the shell to relinquish the resources.
Example: allocate 2 nodes (-N option) from the 'debug' queue (-p option), run 'hostname' on them, then exit
[user@quong ~]$ salloc -N 2 -p debug salloc: Granted job allocation 656482 [user@quong ~]$ mpirun hostname q002.qng q002.qng q003.qng q003.qng [user@quong ~]$ exit exit salloc: Relinquishing job allocation 656482
srun
srun launches an executable or a script onto the first available nodes of the requested queue allocating them first, if not already within a 'salloc' shell.
THIS IS THE ONLY WORKING WAY WITH MVAPICH1.9a2!
Example: run 'hostname' as 4 processes (-n option) on 2 nodes from the 'run' queue
[user@quong ~]$ srun -N 2 -n 4 -p run hostname q010.qng q010.qng q011.qng q011.qng
sbatch
sbatch submits a script (and only a script!) into a queue asking for a number of nodes.
Example: submit a script that runs 'hostname' by 'mpirun' on 3 nodes from the 'run' queue in the 'apenet_development' reservation (--reservation option)
[user@quong ~]$ cat test.sh #!/bin/bash mpirun hostname [fsimula@quong ~]$ sbatch -p run --reservation=apenet_development -N 3 ./test.sh Submitted batch job 656517
If not else specified, stdout is redirected to a file 'slurm_$jobid.out'
[fsimula@quong ~]$ cat slurm-656517.out q012.qng q012.qng q014.qng q014.qng q013.qng q013.qng
scancel
scancel removes a submitted job id from the queue.
Example:
[user@quong ~]$ sbatch -N 8 -p run test.sh Submitted batch job 657280 [user@quong ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 657271 debug bash epastore R 24:19 2 q[002-003] 657278 debug job7.sh delia R 18:13 1 q000 657279 debug job8.sh delia R 18:11 1 q001 657280 run test.sh user PD 0:00 8 (ReqNodeNotAvail) 657272 run job1.sh delia R 18:29 1 q004 657273 run job2.sh delia R 18:27 1 q006 657274 run job3.sh delia R 18:26 1 q007 657275 run job4.sh delia R 18:24 1 q008 657276 run job5.sh delia R 18:22 1 q009 657277 run job6.sh delia R 18:15 1 q005 [user@quong ~]$ scancel 657280 [user@quong ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 657271 debug bash epastore R 25:44 2 q[002-003] 657278 debug job7.sh delia R 19:38 1 q000 657279 debug job8.sh delia R 19:36 1 q001 657272 run job1.sh delia R 19:54 1 q004 657273 run job2.sh delia R 19:52 1 q006 657274 run job3.sh delia R 19:51 1 q007 657275 run job4.sh delia R 19:49 1 q008 657276 run job5.sh delia R 19:47 1 q009 657277 run job6.sh delia R 19:40 1 q005
sinfo
sinfo lists currently available queues and their status.
Example output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up 30:00 2 alloc q[000-001] debug* up 30:00 2 idle q[002-003] run up 4:00:00 4 maint q[012-015] run up 4:00:00 6 alloc q[004-009] run up 4:00:00 2 idle q[010-011]
squeue
squeue lists currently queued jobs in the available queues.
Example output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 656479 debug job7.sh delia R 4:32 1 q001 656480 debug job8.sh delia R 4:24 1 q000 656473 run job1.sh delia R 5:07 1 q008 656474 run job2.sh delia R 5:04 1 q009 656475 run job3.sh delia R 4:57 1 q004 656476 run job4.sh delia R 4:44 1 q005 656477 run job5.sh delia R 4:42 1 q006 656478 run job6.sh delia R 4:36 1 q007
Notes
As can you see from the examples, once inside a 'salloc' shell there is no need to specify the '-n/-np' option when launching by 'mpirun'; default is to run on all cores on all allocated nodes.