Difference between revisions of "QUonG HowTo"
From APEWiki
Jump to navigationJump to searchLine 1: | Line 1: | ||
− | |||
'''Available Resources on QUonG:''' | '''Available Resources on QUonG:''' | ||
− | * 16 nodes | + | * 16 nodes (SuperMicro X8DTG-D) |
* each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket | * each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket | ||
* each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch) | * each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch) | ||
* GNU C/C++ compiler is version 4.4.7 | * GNU C/C++ compiler is version 4.4.7 | ||
+ | |||
+ | '''Available SLURM queues on QUonG''' | ||
+ | * 4 nodes (q000-03) in the '''debug''' queue (30 min. run time limit) | ||
+ | * 8 nodes (q004-11) in the '''run''' queue (4 hrs. run time limit) | ||
+ | * remaining 4 nodes (q012-15) in the '''run''' queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access | ||
+ | |||
+ | '''To view resources on QUonG:''' | ||
+ | * 'sinfo': list of currently available queues and their status | ||
+ | |||
+ | Example output: | ||
+ | |||
+ | PARTITION AVAIL TIMELIMIT NODES STATE NODELIST | ||
+ | debug* up 30:00 2 alloc q[000-001] | ||
+ | debug* up 30:00 2 idle q[002-003] | ||
+ | run up 4:00:00 4 maint q[012-015] | ||
+ | run up 4:00:00 6 alloc q[004-009] | ||
+ | run up 4:00:00 2 idle q[010-011] | ||
+ | |||
+ | * 'squeue': list of currently queued jobs in the available queues | ||
+ | |||
+ | Example output: | ||
+ | |||
+ | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
+ | 656479 debug job7.sh delia R 4:32 1 q001 | ||
+ | 656480 debug job8.sh delia R 4:24 1 q000 | ||
+ | 656473 run job1.sh delia R 5:07 1 q008 | ||
+ | 656474 run job2.sh delia R 5:04 1 q009 | ||
+ | 656475 run job3.sh delia R 4:57 1 q004 | ||
+ | 656476 run job4.sh delia R 4:44 1 q005 | ||
+ | 656477 run job5.sh delia R 4:42 1 q006 | ||
+ | 656478 run job6.sh delia R 4:36 1 q007 | ||
+ | |||
+ | '''To run on QUonG:''' | ||
+ | * allocate a number of nodes from a queue with 'salloc', get an interactive shell and run from there, exit the shell to relinquish the resources. | ||
+ | |||
+ | Example: | ||
+ | |||
+ | [user@quong ~]$ salloc -p debug -N 2 | ||
+ | salloc: Granted job allocation 656482 | ||
+ | |||
+ | [user@quong ~]$ mpirun hostname | ||
+ | q002.qng | ||
+ | q002.qng | ||
+ | q003.qng | ||
+ | q003.qng | ||
+ | |||
+ | [user@quong ~]$ exit | ||
+ | exit | ||
+ | salloc: Relinquishing job allocation 656482 | ||
+ | |||
+ | * execute a script with '''srun''' asking for a number of nodes from a queue. | ||
+ | |||
+ | Example: | ||
+ | |||
+ | [user@quong ~]$ cat test.sh | ||
+ | #!/bin/bash | ||
+ | mpirun hostname | ||
+ | |||
+ | [user@quong ~]$ srun -p run -N 2 ./test.sh | ||
+ | q010.qng | ||
+ | q011.qng | ||
+ | q011.qng | ||
+ | q010.qng |
Revision as of 14:27, 11 November 2013
Available Resources on QUonG:
- 16 nodes (SuperMicro X8DTG-D)
- each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket
- each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch)
- GNU C/C++ compiler is version 4.4.7
Available SLURM queues on QUonG
- 4 nodes (q000-03) in the debug queue (30 min. run time limit)
- 8 nodes (q004-11) in the run queue (4 hrs. run time limit)
- remaining 4 nodes (q012-15) in the run queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access
To view resources on QUonG:
- 'sinfo': list of currently available queues and their status
Example output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up 30:00 2 alloc q[000-001] debug* up 30:00 2 idle q[002-003] run up 4:00:00 4 maint q[012-015] run up 4:00:00 6 alloc q[004-009] run up 4:00:00 2 idle q[010-011]
- 'squeue': list of currently queued jobs in the available queues
Example output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 656479 debug job7.sh delia R 4:32 1 q001 656480 debug job8.sh delia R 4:24 1 q000 656473 run job1.sh delia R 5:07 1 q008 656474 run job2.sh delia R 5:04 1 q009 656475 run job3.sh delia R 4:57 1 q004 656476 run job4.sh delia R 4:44 1 q005 656477 run job5.sh delia R 4:42 1 q006 656478 run job6.sh delia R 4:36 1 q007
To run on QUonG:
- allocate a number of nodes from a queue with 'salloc', get an interactive shell and run from there, exit the shell to relinquish the resources.
Example:
[user@quong ~]$ salloc -p debug -N 2 salloc: Granted job allocation 656482
[user@quong ~]$ mpirun hostname q002.qng q002.qng q003.qng q003.qng
[user@quong ~]$ exit exit salloc: Relinquishing job allocation 656482
- execute a script with srun asking for a number of nodes from a queue.
Example:
[user@quong ~]$ cat test.sh #!/bin/bash mpirun hostname
[user@quong ~]$ srun -p run -N 2 ./test.sh q010.qng q011.qng q011.qng q010.qng