simlin[x], Computation Servers

Available

simlin1.ets.kth.se  [8 cores Xeon 2.33 GHz, 16 GB RAM]
simlin2.ets.kth.se  [6 cores Phenom 3.2 GHz, 16 GB RAM]
simlin3.ets.kth.se  [4 cores i7 3.7 GHz, 64 GB RAM] [+4 pretend cores, HT]

Purpose

Down in the local server-room (Teknikringen 35) we keep some shared computers that users can log in to for running programs. The reason why users might want to use a server for their work could be that their calculations take a long time or need a lot of memory, or just that the user wants programs and an environment not found on their desktop/laptop computer. Multiple users can access each server at a time. The servers are only interrupted occasionally, when they are rebooted for kernel updates after checking that there are no running calculations; i.e. they are typically up for half a year without a break.

Currently the servers all run Linux ( Scientific Linux 6.x, which is exactly compatible with RedHat Enterprise 6.x). We do have the potential to run other systems, e.g. one of the various forms of m$-windos server that (at absurd cost) also have the amazing (for them) ability to allow more than one user at a time. But I've long ago learned that getting involved in these systems is more effort than it's worth ... if needed, this can be sorted out by the other administrators. That is, for example, if users need programs that are available only for that platform, or if users find it terribly hard to use our existing systems.

Benefit or not

Most of us do some calculations by computer. In some cases, the calculation is easily done on a modern laptop (or telephone)? In others, it may involve billions of steps, e.g. field solutions, dynamics, optimisation. It is a pity to let research questions be poorly answered just because of needing a bit more memory to run a more detailed model, or because of it being inconvenient to run a calculation on a laptop uninterrupted for a week or a month. A better instrument in the lab can open the way to interesting new results in a field that already has been studied a lot with the more limited old instruments: the same is true with calculations, where old questions might be able to be strongly answered by a more detailed study that was not practicable with the hardware and software available in e.g. the 1990s.

Back 10 years ago, multi-core processors were special and weren't easily found in laptops. Memory was also rather expensive. At that time, a shared server was clearly useful, for giving more and faster processors and more memory. Nowadays (2014) it's shocking how good a thin laptop can be! But we of course always manage to find bigger calculations to solve. A high performance desktop or server type of computer can still make a difference. The ability to disconnect from the server and leave a calculation running for a long time is also a virtue. There are also some types of problem that simply require the same calculation to be done on loads of different input parameters: these are clearly very easy cases to do independently in parallel across lots of computers and processors, which most users don't have available.

Whether you get a benefit from the shared computers depends, therefore, very much on what you are doing!

Access

For a user not already familiar with command-based activities, the easiest way to access the servers is currently probably Thinlinc. You can download a client for your computer from the Download link, then run this to connect to our servers (simlin3.ets.kth.se etc). The client gives you options of which desktop environment to run, how big to make the window, etc. You can then run a remote desktop-session, and you can disconnect but leave everything running ... and then reconnect later.

An alternative that we still have is "Nomachine" NX; you can download the NXclient program for your own computer, and use it to connect to a choice of sessions on the servers. Some seem to prefer it to Thinlinc. It uses X11 rather than VNC as the underlying way of transferring the desktop view from server to client. A trouble we have noticed is that its greater dependence on the local graphics (on your computer) can in some special cases make programs stop their calculations when you disconnect from the session .... for some users that defeats the main purpose of the simulation servers!

Other methods depend on a command-line login, by SSH. You need an SSH client. On any unixish (linux/bsd/etc) system nowadays, you will already have the command ssh for this, and can even use the options ssh -XY to cause graphical parts of programs to open on your computer's display. On other systems, you might need to install another program, e.g. "putty".

Once you get a command login, you can use it directly, or can start various server programs that you then can connect to from your computer with a suitable client. One is vncserver which is like a less automated thinlinc (thinlinc is in fact an enhanced VNC). Another is comsol server, which allows a graphical session on another or the same computer to get its calculations done by a different process.

About file access see the later section about where the files are kept.

Starting programs

If you are logged in to a graphical desktop environment, there will be a menu showing programs. But I am too lazy to try to make all the extra technical programs appear as childish pictures for clicking. The only thing you need to open is a shell (a "command prompt", in a terminal emulator window); this will be visible on the desktop or in some part of the program menu. Then you type the name of the program you want to start.

In some cases we have lots of old versions still installed, in case someone needs a specific version. The commands then include a version number appended to the name, e.g. comsol-4.4.

The shell's "tab-completion" is a useful way to show all the options. For example, if you want to start Matlab, you can type matlab in the shell, and then press the Tab key a couple of times. This should show a list of all the commands that start with that word. Typing just

matlab
will start whatever is the default version (usually the latest, unless it has a known trouble). Adding some more text allows you to access another version, such as
matlab-7.10

Another way to see the available add-on programs is to browse under the /pkg directory, e.g. /pkg/comsol/. Altogether there are several thousand commands (including all the little utilities and other programs that come as part of the system) so tab-completion isn't very helpful until you've typed part of a name.

Many programs have options that are useful to know. For example, if you start matlab by the command

matlab -nosplash -nodesktop
then the matlab prompt will appear within your shell, without starting the whole matlab interface. You could therefore run a non-graphical session in a shell login without running a desktop at all. Even in a desktop environment, this method has advtanges, such as that you get a correctly working terminal (unlike the matlab command window), so that the matlab command
for n=1:10; fprintf('\r%4d',n); pause(1); end; disp(' ');
will do as it ought to (\r means to go to the start of the line again, not to go to a new line).

Other options for starting programs allow you to choose how many processors they will use, or to include options like comsol client/server and comsol-with-matlab. For unix commands you can often read about the options in their manual; e.g. for the grep command, you would type man grep. Many commands will display help text if you run them with the option --help or -h or -help. This is true for comsol, for example, where you will see a lot of options by

comsol -help
, or, more specifically to a certain mode of comsol,
comsol client -help
You can also read the long user-guide manuals for these add-on programs, on the web or from the local installation, for more detail on things like using comsol's client-server method.

Important! Where are the files? Security and Speed

The servers (simlinX) and the fileserver (penguin) that provides home directories, usernames and authentication, are all in the same server room, with short-term backup power supplies. They are "self-contained" in that their operation does not depend on the external network or KTH's central services. Therefore, apart from exceptional situations like a long power failure or a failure of one of these servers, operation should be highly reliable.

Your default directory ("home directory") on all the computers is the same; it is stored on the fileserver penguin, and on all the servers it appears under the path /home/USERNAME. This is where programs will store configuration data about your settings. You can also access this directory from ms-windows file manager by the path \\penguin\USERNAME, or by an SCP (or SFTP) client connecting to any of the servers.

The home directory is backed up to another server, and various old versions can be obtained in the event of an accidental deletion or overwrite; the schedule is approximately every night for the last week, and every weekend for a long time! However, it is wasteful to have large temporary files being written in the home directory: this can slow down the calculations due to the lower speeds of the network (instead of local disks), and it causes backups to be made of unimportant but large files. It is therefore better to store temporary files to /tmp or to somewhere under /local.

Our standard configuration is that /home is on the fileserver, /local is on an array of local disks (some redundancy but no backup), and /tmp is in memory and is therefore very very fast but is lost on reboot and is limited to only a few GB. Please give attention to these points when deciding how to handle temporary files during simulations. Comsol, for example, defaults to dumping many GB of files into its hidden configuration directory .comsol/ under your home directory. In the Comsol options you can change this: making a new subdirectory for yourself under /local is more appropriate.

Tricks

For Matlab you might find some features of the parallel toolbox interesting, Matlab Parallel. Comsol also has a lot of options for computation on multiple CPUs and multiple computers.

There's a lot that can be done by scripts (perl, unix-shell, etc), to automate the running of programs in a loop or in parallel. Indeed, the ready-available command parallel is handy for this. For me, compression of a big batch of images is helped by

for f in *.jpg; do echo convert -quality 65 $f _$f ; done | parallel -j+0 
, where the first part generates a list of commands (one per image) and the last part runs these as parallel processes on as many CPU cores as are available.

If you use a command-line login, you might like the screen program. It allows you to run, and switch between, different shells (prompts), and to disconnect and reconnect from them.

Other options (more performance)

KTH PDC

At KTH there is the PDC (parallel computer centre), now calling itself High-Performance Computer centre. This is a further option you could try if very large memory or large numbers of processors are needed.

However: please don't be fooled by the belief that "your problem" with a particular simulation program can be solved almost arbitrarily quickly by accessing "more computing power"!

If you have a single calculation to run (not lots of independent ones), and if you have enough memory to run it on our own computers, then it's unlikely to be any faster at PDC than here -- it might be quicker here. That's because simulation programs (e.g. Comsol) solving a single problem are unlikely to get a benefit from more than a few parallel CPUs. The increased speed for a multithreaded calculation is sublinear with the number of threads, and may even get worse beyond a point. When you try to use more threads to take advantage of more CPUs, you spend more time waiting for one part of the calculation to finish on one CPU, or for communication between the different parts. Large installations like PDC focus on lots of CPUs and memory, and the CPUs are generally quite average ones (since it costs a lot more for top-of-range). On our servers we have only a few CPUs (cores) in each -- which are all that can be efficiently used by our typical simulation programs -- but we make sure that each of these is about the fastest that can be got at the time.

Graphics Cards, NUMA

The "GPU"s (graphics processing units) of modern graphics cards are capable of very many parallel calculations. These can be used for calculations that are passed back to the user, instead of going to the screen. Use of GPUs for scientific computation is being strongly driven by manufacturers, e.g. GPU Parallel Computing, with specifically optimised programming languages, CUDA. Matlab has some support, GPUs in Matlab. We do not have this capability now. We could buy a server with a supported type of GPU if anyone considers it worthwhile to try this. I'm not sure about how multi-user access would work.

External services

It is possible to use external services to pay for access "by the hour" to potentially many servers. A major example is Amazon's "Elastic Compute" service, EC2, which includes "instances" optimized for high memory, computation power, and GPUs (see above!). On the rented computer, you run your choice of virtual machine, which can be a standard one or your own installation; you get full access to that operating system. We once found it convenient to use some of the EC2 computers as well as local ones at KTH, all running separate instances of a proprietary simulation program (which needed to contact the KTH licence-server) for dealing with some millions of separate cases that were studied; more detail can be found starting at page 46 of this thesis [pdf].