Beowulf cluster quick manual

QUICK USER'S GUIDE
(Running jobs with large memory requirements)

Hereby I want to explain how to submit jobs to CoRE/CAIP cluster which require large memory.

Cluster Node Hardware:

Xeon nodes mostly have two four core CPUs i.e. 2 CPUs x 4 cores (machines in CAIP cluster n01-n26 have 2 cores i.e. 2 CPUs x 2 cores).
Each nodes has 1GB memory per core i.e. 2 x 4 =8 core nodes have 8GB of memory and 2x2=4 cores nodes have 4 GB of memory.
Major part of the cluster consists from 8 cores nodes with 8 GB of memory.

Schematic presentation & image of real 2- and 4-core nodes.

Cluster Software (queueing system):

queues wp4(m) consist from nodes with 4GB of meory while wp6(m), wp7(m) queues have nodes with 8 core and 8GB of memory.
queues wpN and wpNm share the same physical memory (N=4,6,7). wp4(m) have 4GB total, while wp6(m) have 8GB, and wp7(m) have 8GB total per node.
Now importand part. If your job requires less than 1 GB memory per process -- no problem just submit and be happy.
If your job requires more than 1GB then one should use only one queue with "m" or without. For example if one uses wp4 then he/she should not submit job to queue wp4m. The same applicable for wp6(m) and wp7(m) pairs.
Next let us take 8 core machines. For example consider wp7(m) or wp6(m) queues.
So, 16 machines should have total 16 x 8 =128 cores, but we split them into two queues:

16 (machines) x 4 (core) = 64 cores in queue wpN (green)
16 (machines) x 4 (core) = 64 cores in queue wpNm (red)

Schematic presentation of cluster queues (wpN & wpNm).

Problem #1: a job needs 2 GB per core (practical).
Solution : Submit jobs with 32 core requirements using mpimem PE i.e. in your submission script:
#$ -pe mpimem 32
#$ -q wp7
(here one should understand that one does not leave free memory for other users who might want to use the rest "free" cores i.e. 64-32=32 cores)
(mistake: if one sends a jobs with 64 core requirements then he/she steals memory from adjacent queue as memory used per job will be 4 (processes/cores) x 2GB = 8GB i.e. all memory in both queues is gone!)
Problem #2: a job needs 4 GB per core (practical only for queueue wp6(m) & wp7(m) ).
Solution : Submit jobs with 16 core requirements using mpimem PE i.e. in your submission script:
#$ -pe mpimem 16
#$ -q wp7
(here one should understand that one does not leave free memory for other users who might want to use the rest "free" cores i.e. 64-16=48 cores)
Problem #3: a job needs 8 GB per core (theoretical i.e should not be used in practice).
Solution : Submit jobs with 16 core requirements using mpimem PE i.e. in your submission script:
#$ -pe mpimem 16
#$ -q wp7
(here one should understand that one does not leave free memory for other users who might want to use the rest "free" cores i.e. 64-16=48 cores as well as for users in queue wp7m where 64 core are free as no memory left for the second queue as it was used by a job with 8GB requirement)

Miscellaneous:

Please submit questions or suggestions to Viktor Oudovenko

Last update: May 05 2009