QUICK
USER'S GUIDE
(Running jobs with large memory requirements)
Hereby I
want to explain how to submit jobs to CoRE/CAIP cluster which require
large memory.
Cluster Node Hardware:
Xeon nodes mostly have two four core CPUs i.e. 2
CPUs x 4 cores (machines in CAIP cluster n01-n26 have 2 cores
i.e. 2 CPUs x 2 cores).
Each nodes has 1GB memory per core i.e. 2 x 4 =8 core nodes have 8GB of
memory and 2x2=4 cores nodes have 4 GB of memory.
Major part of the cluster consists from 8 cores nodes with
8 GB of memory.
2-CPU x 2-core node
2-CPU x 4-core node
Schematic presentation & image of real 2- and 4-core nodes.
Cluster Software (queueing system):
queues wp4(m)
consist from nodes with 4GB of meory while wp6(m),
wp7(m) queues have nodes with 8 core and 8GB of memory.
queues wpN and wpNm share the same physical memory
(N=4,6,7). wp4(m) have 4GB total, while wp6(m) have 8GB, and wp7(m)
have 8GB total per node.
Now importand part. If your job requires less than 1 GB
memory per process -- no problem just submit and be happy.
If your job requires more than 1GB then one should use only one queue
with "m" or without. For example if one uses wp4 then he/she should not
submit job to queue wp4m. The same applicable for wp6(m) and wp7(m)
pairs.
Next let us take 8 core machines. For example consider
wp7(m) or wp6(m) queues.
So, 16 machines should have total 16 x 8 =128 cores, but we split them
into two queues:
16 (machines) x 4 (core) = 64 cores in queue wpN (green)
16 (machines) x 4 (core) = 64 cores in queue wpNm (red)
Schematic
presentation of cluster queues (wpN & wpNm).
Problem #1:
a job needs 2 GB per core (practical). Solution
: Submit jobs with 32 core
requirements using mpimem
PE i.e. in your submission script:
#$ -pe mpimem
32
#$ -q wp7 (here one should
understand that one
does not leave free memory for other users who might want to use the
rest "free" cores i.e. 64-32=32 cores) (mistake:
if one sends a jobs with 64 core requirements then he/she steals memory
from adjacent queue as memory used per job will be 4 (processes/cores)
x 2GB = 8GB i.e. all memory in both queues is gone!)
Problem #2:
a job needs 4 GB per core (practical only for queueue wp6(m) &
wp7(m) ). Solution
: Submit jobs with 16 core
requirements using mpimem
PE i.e. in your submission script: #$ -pe mpimem 16 #$ -q wp7 (here one should
understand that one does not leave free memory for other users who
might want to use the rest "free" cores i.e. 64-16=48 cores)
Problem #3:
a job needs 8 GB per core (theoretical i.e should not
be used in practice). Solution
: Submit
jobs with 16 core requirements using mpimem PE
i.e. in your submission script:
#$ -pe mpimem
16
#$ -q wp7 (here
one should understand that one does not leave free memory for other
users who might want to use the rest "free" cores i.e. 64-16=48 cores
as well as for users in queue wp7m where 64 core are free as no memory left for the second
queue as it was used by a job with 8GB requirement)