Difference between revisions of "QUonG HowTo"
From APEWiki
Jump to navigationJump to searchm |
m |
||
Line 1: | Line 1: | ||
− | + | == Available Resources on QUonG: == | |
+ | |||
* For hardware: | * For hardware: | ||
** 16 nodes (SuperMicro X8DTG-D) | ** 16 nodes (SuperMicro X8DTG-D) | ||
** each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket | ** each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket | ||
** each node owns 2 Tesla M2075 | ** each node owns 2 Tesla M2075 | ||
− | ** | + | ** each node owns 1 InfiniBand board - Mellanox ConnectX (MT26428) PCIe Gen2 on a x4 PCIe slot |
* For sofware: | * For sofware: | ||
** each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch) | ** each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch) | ||
− | ** GNU C/C++ compiler is version 4.4.7 | + | ** GNU C/C++/Fortran compiler is version 4.4.7 |
** OpenMPI 1.5.4 (standard package within CentOS 6.4) - configurable with '''module load openmpi.x86_64''' | ** OpenMPI 1.5.4 (standard package within CentOS 6.4) - configurable with '''module load openmpi.x86_64''' | ||
** OpenMPI 1.7 - install path is '''/usr/local/ompi-trunk''' | ** OpenMPI 1.7 - install path is '''/usr/local/ompi-trunk''' | ||
Line 16: | Line 17: | ||
** SLURM batch job manager | ** SLURM batch job manager | ||
− | '''Available SLURM queues on QUonG''' | + | * '''Available SLURM queues on QUonG''' |
− | * 4 nodes (q000-03) in the '''debug''' queue (30 min. run time limit) | + | ** 4 nodes (q000-03) in the '''debug''' queue (30 min. run time limit) |
− | * 8 nodes (q004-11) in the '''run''' queue (4 hrs. run time limit) | + | ** 8 nodes (q004-11) in the '''run''' queue (4 hrs. run time limit) |
− | * remaining 4 nodes (q012-15) in the '''run''' queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access | + | ** remaining 4 nodes (q012-15) in the '''run''' queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access |
− | + | == To view resources on QUonG: == | |
* 'sinfo': list of currently available queues and their status | * 'sinfo': list of currently available queues and their status | ||
Line 47: | Line 48: | ||
656478 run job6.sh delia R 4:36 1 q007 | 656478 run job6.sh delia R 4:36 1 q007 | ||
− | + | == To run on QUonG: == | |
* allocate a number of nodes from a queue with '''salloc''', get an interactive shell and run from there, exit the shell to relinquish the resources. | * allocate a number of nodes from a queue with '''salloc''', get an interactive shell and run from there, exit the shell to relinquish the resources. | ||
Line 73: | Line 74: | ||
Submitted batch job 656517 | Submitted batch job 656517 | ||
− | If not specified, stdout is redirected to a file 'slurm_$jobid.out' | + | If not else specified, stdout is redirected to a file 'slurm_$jobid.out' |
[fsimula@quong ~]$ cat slurm-656517.out | [fsimula@quong ~]$ cat slurm-656517.out |
Revision as of 15:19, 12 November 2013
Available Resources on QUonG:
- For hardware:
- 16 nodes (SuperMicro X8DTG-D)
- each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket
- each node owns 2 Tesla M2075
- each node owns 1 InfiniBand board - Mellanox ConnectX (MT26428) PCIe Gen2 on a x4 PCIe slot
- For sofware:
- each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch)
- GNU C/C++/Fortran compiler is version 4.4.7
- OpenMPI 1.5.4 (standard package within CentOS 6.4) - configurable with module load openmpi.x86_64
- OpenMPI 1.7 - install path is /usr/local/ompi-trunk
- MVAPICH2-1.8 - install path is /usr/local/mvapich2-1.8
- MVAPICH2-1.9a2 - install path is /usr/local/mvapich2-1.9a2
- NVIDIA driver version 295.41 with CUDA 4.2 SDK (install path is /usr/local/cuda)
- SLURM batch job manager
- Available SLURM queues on QUonG
- 4 nodes (q000-03) in the debug queue (30 min. run time limit)
- 8 nodes (q004-11) in the run queue (4 hrs. run time limit)
- remaining 4 nodes (q012-15) in the run queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access
To view resources on QUonG:
- 'sinfo': list of currently available queues and their status
Example output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up 30:00 2 alloc q[000-001] debug* up 30:00 2 idle q[002-003] run up 4:00:00 4 maint q[012-015] run up 4:00:00 6 alloc q[004-009] run up 4:00:00 2 idle q[010-011]
- 'squeue': list of currently queued jobs in the available queues
Example output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 656479 debug job7.sh delia R 4:32 1 q001 656480 debug job8.sh delia R 4:24 1 q000 656473 run job1.sh delia R 5:07 1 q008 656474 run job2.sh delia R 5:04 1 q009 656475 run job3.sh delia R 4:57 1 q004 656476 run job4.sh delia R 4:44 1 q005 656477 run job5.sh delia R 4:42 1 q006 656478 run job6.sh delia R 4:36 1 q007
To run on QUonG:
- allocate a number of nodes from a queue with salloc, get an interactive shell and run from there, exit the shell to relinquish the resources.
Example:
[user@quong ~]$ salloc -N 2 -p debug <--- ask for 2 nodes (-N option) from the 'debug' queue (-p option) salloc: Granted job allocation 656482 [user@quong ~]$ mpirun hostname q002.qng q002.qng q003.qng q003.qng [user@quong ~]$ exit exit salloc: Relinquishing job allocation 656482
- execute a script with sbatch asking for a number of nodes from a queue.
Example:
[user@quong ~]$ cat test.sh #!/bin/bash mpirun hostname [fsimula@quong ~]$ sbatch -p run --reservation=apenet_development -N 3 ./test.sh Submitted batch job 656517
If not else specified, stdout is redirected to a file 'slurm_$jobid.out'
[fsimula@quong ~]$ cat slurm-656517.out q012.qng q012.qng q014.qng q014.qng q013.qng q013.qng