Usage of SunFire 15k

Usage of the SunFire 15K machines

Our main production environment consists of 7 Sun Fire 25K machines. When you submit jobs, by default this is the set of machines on which your job will run.

The 25K machines contain Ultra Sparc IV+ chips, so natively optimized code generated by the Studio compilers is often tuned specifically for those chips to get the best performance.

We also have 3 Sun Fire 15K machines with Ultra Sparc III chips (these are a bit slower than the IV+ chips, 1.2 GHz vs. 1.5/1.8 GHz) that have been retained from our previous setup.

It is possible that code optimized for the US IV+ chips will not run properly on the US III chips. A job submitted to Grid Engine can often run anywhere on the compute grid, so one day your US IV+ code will run perfectly on a 25K, but the next day it could end up on a 15K and might crash for no obvious reason.

For this reason, the 15Ks are not included in the default production queues.

Default Production Queues

All jobs start with a default request for
production.q@@us4plus
@us4plus is a hostgroup that currently contains all the machines with US IV+ chips (right now, that means the 25Ks).

Submitting jobs to these machines

Grid Engine provides a number of ways to select potential target machines for jobs. In particular, we have set up a "hostgroup" and a queue.

The hostgroup @us3 is just a short-hand container name for machines with US III chips (currently the 15Ks). The production.q queue is also available on this hostgroup but is not part of the default request configuration within jobs.

How to add the 15Ks to your job request

Only do this if your code can run on these US III machines!
  1. (simplest) the job can run on any machine that is part of production.q:
    #$ ... other directives ...
    #$ -q production.q
  2. the job can also run somewhere in the us3 hostgroup:
    #$ ... other directives ...
    #$ -q *@@us3
  3. ensure the job must run somewhere in the us3 hostgroup:
    #$ -clear
    #$ ... other directives ...
    #$ -q *@@us3
Notes:

The -clear removes any defaults for subsequent Grid Engine directives in this job (and only in this job), in particular the default production queue setup.

There really are 2 "@" symbols in examples #2 and #3. The "-q" line means:

            *            @             @us3
        any queue    containing    the hostgroup
 
 
   
© HPCVL 2007