|
The nodes of computer clusters such as our Victoria-Falls cluster
are often multi-core and each core may support multiple threads. This
suggests a two-layer approach when programming new codes or adapting
existing ones. To optimally exploit the distributed-memory structure
of the cluster while at the same time keeping each shared-memory node
busy, a combined MPI-OpenMP programming approach may be taken.
Here we present sample-code that implements two-layered
master-slaves models that use MPI to communicate between a master node
and multiple slave nodes, while additionally allocating work
dynamically within the nodes using OpenMP. The current version of the
code is written in C. In this example, the workload consists of the
prime factorization of integers.
In all of these models, the master MPI process M bundles basic
jobs into job groups and sends these to the slave processes S for
execution. When any slave process runs out of work it sends a message
to M to obtain more work. Each of the slave processes further spawns a
number of OpenMP threads to help with working through the job
groups. This is done in a dynamic manner, i.e. slave-threads work on
one job from the current job group at a time and acquire more when
they are done. When they run out of jobs in the group, one of them
asks the master for another group through MPI.
Six different model variations were implemented:
- In the simplest version of the model (Plan 1 in the code), the
slaves ask for more work from outside the local parallel OpenMP region
when all the work is completed. The work itself is of course done
inside of the parallel region. The master runs in serial mode, i.e. it
does not involve any OpenMP directives.
- This (Plan 2 in the code) is a similar approach, but here the
master is also multithreaded. Only the main thread of M communicates
via MPI with the slaves S. The other master-threads do additional
pre-allocated work.
- Here (Plan 5 in the code), both master and slaves are
multithreaded and work from inside the OMP region. The main
threads of the slaves S communicate with the main thread of M anytime
they run out of work, not waiting for the others. The OpenMP threads
on the master M do pre-allocated work.
- This (Plan 6 in the code) is the same same as the previous one,
but the threads on the master may ask for work dynamically through
OpenMP from the main thread.
- This model (Plan 11 in the code) is similar to the previous two,
but any of the S threads can ask for new work from the main
thread on M when they run out. All others continue to execute jobs
from the previous or the new job group. Work for the other threads on
the master is pre-allocated.
- This (Plan 12 in the code) is the same as the previous one, but
the work for the master-threads is dynamically allocated.
Note that the MPI communication in these models has to be protected by
a critical region to prevent a race condition on shared
communication resources and messages. Therefore any communication
event is between one thread on the master and one thread on the slave,
albeit a potentially different one each time. MPI communication
between different threads on the same process does not take place.
The sample code can be downloaded by clicking
here. It includes source code, an input file, and a script for
running it on our systems. Alterations to the script may be
necessary. The code works through all six model variations and reports
success or failure. We make this available to our users (and anyone
else) under the condition that using parts of the code will be
acknowledged. We hope that this code supplies you with a framework to
adapt for your own programs, particularly if you plan to deploy them
on a cluster with multi-core/multi-threaded nodes.
|