Difference between revisions of "QUonG HowTo"

From APEWiki
Jump to: navigation, search
m
Line 37: Line 37:
  
 
'''To run on QUonG:'''
 
'''To run on QUonG:'''
* allocate a number of nodes from a queue with 'salloc', get an interactive shell and run from there, exit the shell to relinquish the resources.
+
* allocate a number of nodes from a queue with '''salloc''', get an interactive shell and run from there, exit the shell to relinquish the resources.
  
 
Example:
 
Example:
  
  [user@quong ~]$ salloc -p debug -N 2
+
  [user@quong ~]$ salloc -N 2 -p debug <--- ask for 2 nodes ('''-N''' option) from the 'debug' queue ('''-p''' option)
 
  salloc: Granted job allocation 656482
 
  salloc: Granted job allocation 656482
 
 
  [user@quong ~]$ mpirun hostname
 
  [user@quong ~]$ mpirun hostname
 
  q002.qng
 
  q002.qng
Line 49: Line 48:
 
  q003.qng
 
  q003.qng
 
  q003.qng
 
  q003.qng
 
 
  [user@quong ~]$ exit
 
  [user@quong ~]$ exit
 
  exit
 
  exit
 
  salloc: Relinquishing job allocation 656482
 
  salloc: Relinquishing job allocation 656482
  
* execute a script with '''srun''' asking for a number of nodes from a queue.
+
* execute a script with '''sbatch''' asking for a number of nodes from a queue.
  
 
Example:
 
Example:
Line 61: Line 59:
 
  #!/bin/bash
 
  #!/bin/bash
 
  mpirun hostname
 
  mpirun hostname
 +
[fsimula@quong ~]$ sbatch -p run --reservation=apenet_development -N 3 ./test.sh
 +
Submitted batch job 656517
 +
 +
If not specified, stdout is redirected to a file 'slurm_$jobid.out'
  
  [user@quong ~]$ srun -p run -N 2 ./test.sh
+
  [fsimula@quong ~]$ cat slurm-656517.out
  q010.qng
+
q012.qng
  q011.qng
+
q012.qng
  q011.qng
+
  q014.qng
  q010.qng
+
  q014.qng
 +
  q013.qng
 +
  q013.qng

Revision as of 14:51, 11 November 2013

Available Resources on QUonG:

  • 16 nodes (SuperMicro X8DTG-D)
  • each node is dual socket, with one 4-core Intel Xeon E5620 2.40GHz per socket
  • each node OS is (diskless, boot from network) Centos 6.4 with kernel 2.6.32-358.2.1.el6.x86_64 (x86_64 arch)
  • GNU C/C++ compiler is version 4.4.7

Available SLURM queues on QUonG

  • 4 nodes (q000-03) in the debug queue (30 min. run time limit)
  • 8 nodes (q004-11) in the run queue (4 hrs. run time limit)
  • remaining 4 nodes (q012-15) in the run queue under SLURM reservation, needs permission for '--reservation=apenet_development' option - ask us if you need access

To view resources on QUonG:

  • 'sinfo': list of currently available queues and their status

Example output:

 PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up      30:00      2  alloc q[000-001]
debug*       up      30:00      2   idle q[002-003]
run          up    4:00:00      4  maint q[012-015]
run          up    4:00:00      6  alloc q[004-009]
run          up    4:00:00      2   idle q[010-011]
  • 'squeue': list of currently queued jobs in the available queues

Example output:

 JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
656479     debug  job7.sh    delia   R       4:32      1 q001
656480     debug  job8.sh    delia   R       4:24      1 q000
656473       run  job1.sh    delia   R       5:07      1 q008
656474       run  job2.sh    delia   R       5:04      1 q009
656475       run  job3.sh    delia   R       4:57      1 q004
656476       run  job4.sh    delia   R       4:44      1 q005
656477       run  job5.sh    delia   R       4:42      1 q006
656478       run  job6.sh    delia   R       4:36      1 q007

To run on QUonG:

  • allocate a number of nodes from a queue with salloc, get an interactive shell and run from there, exit the shell to relinquish the resources.

Example:

[user@quong ~]$ salloc -N 2 -p debug <--- ask for 2 nodes (-N option) from the 'debug' queue (-p option)
salloc: Granted job allocation 656482
[user@quong ~]$ mpirun hostname
q002.qng
q002.qng
q003.qng
q003.qng
[user@quong ~]$ exit
exit
salloc: Relinquishing job allocation 656482
  • execute a script with sbatch asking for a number of nodes from a queue.

Example:

[user@quong ~]$ cat test.sh
#!/bin/bash
mpirun hostname
[fsimula@quong ~]$ sbatch -p run --reservation=apenet_development -N 3 ./test.sh
Submitted batch job 656517

If not specified, stdout is redirected to a file 'slurm_$jobid.out'

[fsimula@quong ~]$ cat slurm-656517.out
q012.qng
q012.qng
q014.qng
q014.qng
q013.qng
q013.qng