Intro to Savio Training (Full Version)

Intro to Savio Training (Full Version)


>>CHRISTOPHER HANN-SODEN: Hi everyone, I’m
Christopher Hann-Soden, I’m one of the BRC Consultants and I’m going to be taking over
the first half of this training on Savio. My colleague in the back there is Chris Paciorek,
he’s also one of the consultants with BRC here and also a professor in Statistics. This training is based off of our documentation
that is online. Hopefully you’ve all been able to find this
documentation on GitHub and so you have access to it. We’ll go through it a little bit quickly. I encourage you to go do things as we do them
when you can and where it’s a little bit too fast for you just come back and use this training
as a reference and go through it on your own time. I’m going to be talking about how to get access
to Savio, how to login to it, and the general system architecture of Savio. And then we’ll go over logging in, getting
data to and from Savio, and then managing the software on Savio. After that, Chris will take over with submitting
jobs and actually getting the work done as well as using Python on Savio. Savio is a Linux cluster which is basically
about 470 maybe a little bit more now computers networked together to operate as if it was
one machine. It is run on this Condo model where about
40% of the nodes or the individual machines in Savio, are provided by the institution
and 60% are purchased by researchers and different research groups on campus and sit within it
as part of this Condo system. The primary way that people get access to
Savio is through the FCA or Faculty Computing Allowance. Every faculty member on campus gets 300,000
service units which are roughly equivalent to core hours on Savio. So just for reference that’s enough to have
a 30 core machine running 24-7 for the entire year for every faculty member. Once the faculty members get those FCAs they
can then add any users to their allocation that they want. The second way that people get access is through
the Condo program. In that you would pay for a certain number
of nodes that live within Savio, and then you get priority access to that amount of
resources whenever you need it. And then the third way that people get access
is through an Instructional Computing Allowance (ICA) so if you’re teaching a course that
has a heavy computational element then you can come to us and we can provide access to
Savio for that course. That really works best for semester long courses,
unfortunately not shorter workshops. So I already said that Savio is a Linux cluster. You access Savio by logging into it at one
of the login nodes. On these login nodes you can access your files
and manage your files, but then from there you submit jobs to the compute nodes which
will actually do your heavy computation. There’s also the data transfer nodes which
are used for getting data to and from Savio so they have more types of networking enabled
on them and they’re configured to facilitate the large amounts of data that come and go
from Savio. Here’s a conceptual diagram of Savio and it’s
networking. You all start right up in here. This is maybe your laptop or workstation and
you use some kind of network protocol through the internet to access Savio, so that could
be SSH, your primary login mode, SCP or FTP for transferring files, or also this service
that I’m going to tell you about called Globus. Through SSH you will be assigned to one of
four login nodes randomly and you can switch between the login nodes but that’s divided
up randomly so that there aren’t too many people sitting on one login node and using
its resources at one time. For file transfer you can ssh to the DTN or
you can also use one of these other protocols in conjunction with the DTN. All of the nodes in Savio have access to its
file systems. I’m going to zoom out here a little bit so
you can see the full diagram. So there are three main file systems on Savio
and every single node no matter where you are or where you are working in Savio has
access to those same file systems. And we’ll talk more about that. From the login nodes you use the SLURM scheduler
to submit or request an allocation of some amount of resources for some period of time. The scheduler tries to find a slot where that’s
available and puts it in according to the rules of the scheduler which we’ll talk about. It may go to one of several different partitions
within Savio. We’ll get into all of that. Any questions on this diagram before I move
on? We’ll come back to it. As I said, there are several different partitions
which are pools of nodes in Savio. Every machine within a partition has identical
architecture, but the different machines between the different partitions have different architectures
or different resources. If you go to the Savio training documentation,
you can see the hardware configuration. So in our user guide, under hardware configuration,
the Savio 1 pool consists of 164 nodes that each have 64 GB of memory and 20 cores, while
the Savio 2 pool has 136 nodes, 24 cores 64 GBs of memory, and etc. So you can see the partitions that are available
there and what their specific resources are. A little bit lower down under the scheduler
configuration, you can see again the partitions, but this is how you would know which partitions
you might have access too or how to name them when you’re requesting jobs. For example, on Savio1 the accounts starting
with “fc-” often have access to that, and some, if you have a Condo allocation, you
could look that up there. The file systems on Savio: there are three
main areas of file systems. The primary ones are your home directory which
is going to be in this /global/home/users/{your username} path, and then the scratch directory
which is under /global/scratch and then your username. The home directory is limited to 10GB per
user, so that space is designed to have you scripts, some of your documents, and other
personal type files. It’s not designed to hold your research data. That should really go into /scratch. Some users will also have access to other
file systems such as the group’s file systems or other storage that they’ve purchased through
the Condo program. That is designed to archive and store data
that is not currently being used but anytime you’re doing actual computation on a set of
data it should be within scratch. And the reason for that is that scratch is
networked to the compute nodes and the DTN through the very fast Infiniband architecture
and that’s designed to handle the kinds of input and output that we get when we have
thousands of users doing heavy computations. The home directory and the other file systems
on Savio are just networked with Ethernet, so if you do computation on data in those
locations it’s going to bog down the entire system and prevent access to everybody. So please be sure to keep your data in /scratch. So now that we’ve gone over the overview,
we’ll get to actually logging into Savio. SSH is the method that you’re going to use
to gain access to Savio and the format … oh actually you’re going to need to have some
kind of SSH client. On mac and Linux that’s just going to be there
by default in your terminal. If you’re using windows a good option is PUTTY
and you’re access may look a bit different using putty. But Mac and Linux users, your command is going
to be ssh and then your username and then there’s two location for the login nodes it’s
going to be hpc.brc.berkeley.edu. I’ll go ahead and log in myself. When you do that, you’ll be asked for your
password. Your password consists of your personal PIN
followed by the onetime password provided by the Google Authenticator app. So I enter my personal pin and then immediately
afterwards I enter the onetime password. And this you won’t see. This is my GitHub. That’s just me. So now I’ve logged into Savio. It drops me into my home directory. At the top of my terminal you can see that
I was assigned to login node 001 and I can look around see that I have a bunch of files. I’m currently in my home directory. My scratch directory is going to be there. Cool.>>AUDIENCE MEMBER: How do you set and reset
your pin if you forgot it?>>CHRISTOPHER: Set and reset your pin? The instructions for setting up the authenticator
and your account are linked here so just go through that on your own time. If you have difficulties with that process
you can contact us for help.>>AUDIENCE MEMBER: So the login node is random,
right?>>CHRISTOPHER: Yeah, the login node is random. Although you can specify a specific login
node I believe. Or, once you’re on I can switch to a different
login node. I do need to enter my password again.>>AUDIENCE MEMBER: Is there a reason you
would switch?>>CHRISTOPHER: Yeah, if there is a problem
with one login node then you might want to switch. If you’re using a session manager such as
TMUX then that will be running on a specific login node, so to get back into that session
you’d want to be on that login node. If another user is behaving badly and running
their computation on the login node and that’s bogging it down then you can switch to one
of the other three and also please let us know and we’ll stop that.>>AUDIENCE MEMBER: How would you know when
things are not running well?>>CHRISTOPHER: You can monitor the status
of the login node using usual UNIX commands such as top. So here I can see the top processes that are
running on the login node as well as some of the resources that are available. So if things aren’t behaving well and you
run top and you see that somebody is running a Python process that is using 700% of the
CPU then that could be a problem. Right now it’s looking fine. One option that you might want to know for
this SSH command is the -y option. This enables X11 forwarding which allows me
to use GUI type applications through ssh. One of the joys of using Savio is getting
to enter onetime passwords a lot. So now I can run a program that has a graphical
interface like Emacs and rather than getting the Emacs within the shell I now have a nice
Emacs window forwarded to my machine. If I didn’t have that -y option it would just
be within the shell. How do you get your data to and from Savio? One of the simplest ways is with SCP or FTP. We’re just going to go over SCP because it’s
more straightforward. For SFTP there are many guides you can lookup
on how to use it but it works fine with Savio. The syntax for the SCP command and using it
with Savio is just your basic SCP commands but make sure that you’re going to be targeting
that to the DTN rather than one of the login nodes. Let me make sure I have this bayarea.csv file. So I have this bayarea.csv file in my home
directory on my laptop. And I’m going to transfer that file to my
account at the DTN. Remember that this tilde symbolizes my home
directory slash and dot, so I’m saying just name it the same thing just put it in my home
directory. And again, onetime passwords. So that transfer is happening now. If you want to change it’s name, rather than
the dot you can just give it a new name like right here. Or you can go ahead and put a full path, such
as putting it in the /scratch directory. That’s actually other Chris’s scratch directory,
I forgot to change that one. And then you can also pull files from Savio
to your local machine just by reversing this order. From Savio, this file, to my home. One of the problems with SCP or one of its
inconveniences is that you can only do one file at a time, so if you have a directory
with thousands of files in it that’s obviously impractical. So you can use tar which is tape archive program
to bundle up and compress files for offloading prior to the SCP command. I’m going to go ahead and cancel that because
it’s going very slow in this room. So I’ve got this data directory and I can
move some junk into it. Like bayarea.csv into the data directory. I can also move this random test directory
into data. So now I’ve got some junk in this directory
and I can use the tar command like so to create an archive. -v is for verbose, -z is for zip, so compress
it when you create it, and -f means to a file rather than to a tape archive. I want it to be named myarchive.tgz and it’s
going to be that data directory. So it tells me I’m compressing all these things
and sticking them in this archive. So then to untar it later, it’s almost the
same command except for the -x for extract rather than create. So that is simple and it works but it can
be a slight pain. So if you’re moving large amounts of data
we highly recommend this service called Globus. Globus can rapidly transfer files using a
point and click method and it will also restart transfers if they fail, it will let you know
the status of them, and it will checksum the data on either side to make sure it got across
without any loss. I’m going to direct you to our documentation
on how to set up your Globus account. But to transfer data to or from another device
like your laptop you’re going to need to install this software Globus Connect Personal and
then you’re going to need to set up your compute as what’s called an “endpoint” in Globus. We’ve already set up Savio as an endpoint
so you can locate it through the Globus app. The Globus app is this web app. Maybe I’ll log out first. So you log in with your CalNet which you’ve
all set up with two-factor authentication because we’re very secure here. And you’ll be brought to the transfer screen
at first. For some reason it only gives you a single
panel which I don’t find very useful. So I’ll usually go ahead and put it to two
panels. And you want to look for a collection or endpoint
here. Savio’s is going to be ucb#brc. So that’s it there and that’s Savio. Oh and I’m already logged in here too. Usually when you do this endpoint it’s then
going to ask you to authenticate access to your data on Savio so you’ll click through
there and you’ll enter in your Savio username as well as your pin and onetime password,
as if you were logging in with ssh to Savio. And once you do that, then you’ll be able
to see your data. Now, the other end that I want to connect
to is my laptop. So go ahead and open up a new terminal here
and this is going to be Globus. I have the Globus Connect Personal software
on my laptop and I want to run Globus Connect which I have previously set up. The setup for that is pretty easy. Within the Globus app you will go to your
endpoints, create a new endpoint “Globus Connect Personal”, give it a name right here, you
will generate a key, copy that key, and then the first time you run Globus Connect Personal
it will ask you for the key and you’ll enter it in there. So I’ve done that, I connect, and in a moment
back in my file manager we’ll be able to select my laptop’s endpoint which I’ve named Ascospore. It’s connected and now it’s active. And so I can go and find this bayarea.csv
file and go ahead and transfer that over to Savio. Transfer request and I can view the details
here, seeing how many files, what point is it at, what’s its status, etc. And just a side note, there is also a Globus
command line interface, and if you want to read the documentation of that you can use
that to automate some of your transfers. For example, after you’re done with an analysis
automatically transfer the results to your storage off-site somewhere. The last data transfer method we want to go
over is using rclone with Box but it’s very similar with bDrive. As Cal affiliates you have access to unlimited
storage to Box and bDrive which can be very handy for moderately large amounts of data. Box does have a 15 GB cap on the file size,
and both of them do have limits on how much data you can transfer per day. If you aren’t familiar with those, you can
check out their web apps and desktop apps to sync files between your machine and the
drive. Globus unfortunately doesn’t work for those
because we can’t set up endpoints for your own private bDrive, but there is some software
such as rclone which is specifically developed to complete these kinds of tasks. Setting up rclone with Savio does require
a little bit of setup. We already have it installed on Savio. Here I’m logged into Savio. No rclone, I think you need to load it. So you do need to load the module, but then
rclone is there and available for you. To set that up on Savio you’ll do this rclone
config and then on Savio you’re going to not do auto
config. So new remote “training-box” and then select
from the list, it’s going to be “Box”. I think you leave this blanks. Oops, wrong one. Now it asks you if you want to do auto config,
you say no there. Then you actually have to go back to your
own laptop, so this is on Savio, but this is a remote machine so you actually have to
have rclone installed on your machine at home. And you do “rclone authorize box”. This is going to open up a browser, which
is why you have to do it on your own machine because you can’t open up a browser on Savio,
at least not easily. And you will log in here with your CalNet
ID to log in to Box. When you do that, back in your terminal it
will give you a code. Copy and paste that into the result here and
then you’re done. Once you’ve done that there are some commands
like “rclone list remotes”. They’re pretty self-explanatory. And you can see that I have a Box account
already setup for my lab association. And then you can list directories and files
and copy files using those rclone commands. Once you get some files on there and you need
to edit them you’re going to need some kind of text editor. On Savio we have three text editors available:
vim and Emacs are both available by default, and you’ve already seen Emacs. If you want to use nano then you have to load
that module, and now there’s nano. I’ve been loading a few modules and you may
be wondering what’s that all about? Software in Savio is stored in this module
system and that’s not designed to be confusing but it does enable us to have many different
versions of software that don’t conflict with each other so that each user can configure
their environment so that it works for them without getting in the way of other users. The commands for this are just simply module
as well as the module load command which you’ve seen me do. How do you know what software is available? You can do that with “module list” which lets
me know which modules are currently loaded. You can see that vim and Emacs are already
loaded whenever I log in. And I’ve since loaded rclone and nano. And you can list all available modules with
the “module avail” command. So there you can see all of the software modules
that are available. One thing that trips people up is that some
of the software is nested hierarchically because software has different dependencies so if
you haven’t loaded one of those dependencies then it won’t show up in the “module avail”
command unfortunately. So if you’re looking to see if something is
available on Savio, you may want to start by loading some of the prerequisites, especially
some of the common ones like openmpi or the gcc or the Intel compiler if you’re using
that. This openmpi should fail. So you go “oh no, there’s no openmpi” but
if I load gcc first then it works. That also prevents you from loading a version
of the software that was built with a different version than you have loaded of some prerequisite,
so it enforces that the dependencies are compatible with each other to some extent. That’s what I have to say about getting onto
Savio and getting your data there. Chris Paciorek is going to take over with
actually running some jobs on Savio. Any questions before I go?>>AUDIENCE MEMBER: How often is data cleared
from scratch?>>CHRISTOPHER: The scratch space is theoretically
unbounded, there are no user limitations as you may have noticed. There’s not a current schedule but it is prone
to occasional clearing, so if it’s getting full we may come through and delete a bunch
of files. So you really should not use it for long-term
storage of your data, it’s not stable, it’s not backed up, and your data could be cleared. That is done I believe on data that is 6 months
old or longer.>>CHRIS PACIOREK: We’re still coming up with
a formal policy on when and how exactly to do that, so it’s been a manual process.>>CHRISTOPHER: Yeah, so I can’t give you
an “every 2 weeks data gets cleared” but if your data is less than 6 months old you should
be safe.>>AUDIENCE MEMBER: [INAUDIBLE]>>CHRISTOPHER: I think we’d be happy to take
a look at that, but I’m not familiar with that error. At the end we will go over some ways you can
get more help and some of our resources that are available with you.>>AUDIENCE MEMBER: How easy is it to transfer
data from your scratch space to your group space? Do you have to transfer the data to your laptop
and then to Savio to the group storage?>>CHRISTOPHER: No, you can transfer between
the different file systems on Savio as you would between different locations on your
normal computer?>>CHRIS: You can use the “cp” command for
example.>>CHRISTOPHER: Yeah, “mv” or “cp” work just
fine.>>CHRIS PACIOREK: Let’s go ahead and get
started again. I’m going to launch into talking about submitting
jobs and how you go about doing that and how you know basically what accounts you can submit
under and how you would submit different kinds of jobs, jobs that are running in serial or
in parallel, those sorts of things. The first thing to know is Christopher mentioned
this idea that there are different partitions and so you may or may not have access to a
particular partition which is a particular group of nodes that all have the same hardware
without running a command that I’m about to show you. This is quite a useful command: “sacctmgr
-p show associations”. What you would do is you’d do that and then
you’d enter “user=” and then your username, so I’ll enter my username. This is going to produce rather more voluminous
output than you would get because I have access to a number of different accounts because
of the particular position I have but the key things that we can see here are if you
have access to a Faculty Computing Allowance you’re going to see some lines that look like
this and basically what this shows in the — it’s a bit hard to read this — in the
fourth column here separated by these pipes what you’ll see is that with the Faculty Computing
Allowance you have access to for example GPU nodes, you have access to the HTC partition,
you have access to the big memory nodes on “savio2” and you have access to “savio2” and
“savio” partitions. So under Faculty Computing Allowance you would
have access to all of those at a normal priority so that means you can submit jobs and you’ll
be in the queue with everybody else to get access to those jobs. The fact that you have normal access is indicated
by this “savio_normal” over here in one of these more right-hand columns. If you have access to a Condo that means that
if you’re a PI or the PI that you’re working with has actually purchased nodes on Savio
and that’s what Christopher was describing before. So in this case I have access to a particular
Condo that’s basically run through the statistics department and so I see that down here in
some of these additional lines that are produced as output from this “sacctmgr” command. So what I see down here is these are all the
partitions that I have access to through my Condo association because I’m associated with
a Condo, but you’ll notice that for a lot of these, many of these have “savio_lowprio”
(low priority) listed here and what that means is that I don’t have regular, normal priority
access to those because we didn’t purchase any of those kinds of nodes. The ones that I have regular access to are
the ones that have a “normal”, so this “stat_gpu2_normal” and this “stat_savio2_normal”. So those are the two partitions, “savio2_gpu”
and “savio2” which I have access to as part of my Condo because we purchased nodes that
are in the “savio2_gpu” and “savio2” partitions. So in general if I were using the Condo I
would want to use one of those two partitions because those are partitions where I have
access that’s guaranteed by the purchase that we’ve made. You can also run jobs in a low priority setting
and those jobs are subject to being canceled if other people that have higher priority
come on and want to use those nodes. I’ll talk about that a little bit more in
a few slides. But that’s the basic distinction that you’d
be looking at: whether you’re in an FCA which means that you have regular access to all
kinds of nodes or most kinds of nodes with a few exceptions, or if you have a Condo which
means you have high priority access to the kinds of nodes that you bought and lower priority
access to the things that you didn’t buy. Any questions on that front? I guess I got ahead of myself because that
was actually the next slide. So let me give a little overview on submitting
jobs, it doesn’t really matter which order I went in. When you submit a job, the basic workflow
for working on Savio is you log in to the login node which Christopher already demonstrated
and then you’ll change directories to where ever you have a job script that you want to
run and we’ll talk about job scripts in a minute. Once you’ve changed directories to where you
have a job script written you can then submit that job script as a job to the scheduler. And what will happen is that — if we go back
to the schematic that Christopher showed — here you are on a login node, you’ve changed the
directory to the directory you want to be in, you then submit the job to the scheduler,
to SLURM and SLURM will then put your job on one or more of these nodes, depending on
how many nodes you’ve requested. So the job will be running on the compute
node rather than running on the login node, but you’ve submitted the job from the login
node. In general, when you submit the job, the current
working directory of where the code in the job will run is the directory from which you
submitted the job. You can change directories in your script
if you want to, but by default that’s the directory that the job will be running from. So that’s the basic workflow. If you find yourself doing something where
you’re trying to run some computations without using either the “sbatch” or the “srun” commands
it means you’re doing something wrong because you’re going to be running your code on the
login node. And we really only allow you to do that if
you’re just running a compilation or you’re moving files around — a shorter compilation,
if you’re running a compilation that takes and hour we’d want you to do that on the compute
nodes as well. Or if you just want to do something really
quick, you want to start up Python and check if you can run a few functions or something
like that , that would be OK to do on the login node. But if you end up running a computation for
more than like 5 minutes and you’re going to use more than one CPU, that’s the sort
of thing you should be doing via SLURM on the compute nodes. So I already discussed this idea of figuring
out what partitions you have access to. The next thing is to give you a sense of what
it looks like to actually submit a job to SLURM. What you would do is you’d be writing a shell
script that basically is just a text file that looks something like this. And this is a bit of a funny script because
it has all of these comments up here with the hash marks. Some of these comments are really instructions
to SLURM on how to submit your job. The things that don’t have comments are the
actual code that you want to run which is the actual computation that you want to do. In this case you’ll always have “#!/bin/bash”
here that says that this is a bash shell script. And then any place where you see a hash mark
and a capital word SBATCH, that’s an instruction to the scheduler on how to run your job. So the sorts of things that you would put
in here is you can optionally name the job so that when you query the different jobs
that are running you see the name of it. You would generally need to say what account
you want to run under. So in this particular case I’m running under
the Condo that I just mentioned. If you have an FCA you’d write something here
like “fc_” and the name of your FCA. And then you’d say what partition you’d want
to run in so for example I could run in the “savio2” partition. And then oftentimes you’d say how many cores
and how many nodes do you want to use for your job and we’ll see that in a little bit. By default, if you don’t say anything you’re
going to get access to a whole node on almost all of the partitions except for one that
I’ll mention a little bit later. The other thing that you’re required to put
down is a time limit. If you’re on a Condo then the time limit will
depend on the Condo, but for most users the maximum time that a job will run is for three
days. So you could put three days on here. If you know it’s going to run rather shorter
than that you should put a more realistic time limit and the advantage of that for you
is that may mean that your job will start running more quickly because the scheduler
is trying to juggle different jobs and if it’s a short job it may be able to fit it
in amongst all the other jobs that are running. And the advantage for us is that the SLURM
scheduler can better manage all of your competing jobs if it has a realistic estimate of how
long the job will take. It’s a bit of a Catch-22 because sometimes
things will happen that you’re not expecting and a job takes longer than you expect, like
“Oh I think this will take an hour” and you find that it takes an hour and 5 minutes,
that means that the job is going to get killed after an hour and you’ll lose all that work
unless you’ve saved output temporarily in some fashion. You can be on the generous side in terms of
putting down the time limit: if you think it’s going to take 3 hours you might put down
a time limit of 6 hours just to be on the safe side.>>AUDIENCE MEMBER: [INAUDIBLE] No, you’re charged for the amount of time
the job is running — I’m pretty sure — you’re charged for the amount of time the job is
running, not the amount of time that you requested. So these SBATCH commands are the things that
say how the job should run and then down here is basically the code that you want to run. So this could invoke Python or MATLAB, it
could invoke some bioinformatics tool that you’re going to use, it could invoke “mpirun”
if you’re running a job across multiple nodes and you need to use MPI if you know what that
is. Here all I’m going to do is just run a very
basic Python script. I’ll load the Python 3.6 module and then I’m
going to run Python and run the Python code that’s in this calc.py function. And I’ll redirect the output and any error
messages to a file calc.out. So let’s do a demo and see what this looks
like. If I look at this job.sh file that should
be essentially the same as what I just showed you in the slides. If I look at calc.py that is just some Python
code. I’m a statistician so a lot of what we do
is just linear algebra so all I’m doing is just some linear algebra here that happens
to be fairly intensive. It doesn’t really matter what it is for the
sake of this demo. In order to submit this job, all I need to
do is run sbatch and then the name of the job script that I just showed you there. And you could create the job script on your
laptop and copy it over in the way that Christopher was talking about, or you could create and
edit the job script on Savio itself using vim or Emacs or nano, the tools that he just
mentioned. I’m going to do “sbatch job.sh” and you’ll
see that it reports back the ID of the job and this is going to be useful potentially
in the future for querying the system and seeing what the status of your job is. Some of the things we can do in terms of looking
at the status of our jobs is “squeue -j” and that will show us some information, we can
do “wwall -j” and then the job ID. So let’s go ahead and do that. I could for example do “squeue -j 5034688”
and that will show you that that job is pending and the reason that it’s pending is because
there aren’t any nodes available to run the job and I’m now in a queue of jobs that would
be run in a certain order. I’m going to cancel this by “scancel” and
then again the number of the job. For the purpose of this training we set up
what’s called a reservation that reserved two nodes for my use to do the presentation,
so we’re going to add “–reservation=savio3-training” to the “sbatch” command. So you wouldn’t usually do this reservation
thing, that’s just for the purpose of getting access to the nodes on an expedited basis
here. And then I would still need to add “job.sh”. Now I get a new job ID. You can see that it incremented a little bit
relative to the old one. It incremented by 4 which means that there
were a few jobs in the interim that somebody else admitted that got the IDs in between. So now I can do “squeue -j 5034692” and now
it shows that that’s pending, which is a little bit odd because I have this reservation set
up. I’m actually not sure why it’s doing that.>>AUDIENCE MEMBER: Can you get an estimation
for how long you have to wait for the job to start?>>CHRIS: There’s is a command I think to
do that and I’m forgetting what it is off the top of my head.>>AUDIENCE MEMBER: If the reservation is
called “savio3-training” why is the partition “savio2”?>>CHRIS: Ah, that is the problem! It’s always good to have extra eyes on the
task here. So I’m going to go in and edit this file and
change it to “savio3”. So most people don’t have access to “savio3”
and I generally don’t use “savio3” which is why the demonstration was for “savio2”, and
it turns out that in order to set up the reservation it made sense to do it on “savio3”. So let me cancel that again and let’s submit
this again. Let’s see. So it may be that it doesn’t like this account
for this, yeah it’s probably that. So this is a special account that I have access
to because I work for the program. Hopefully this will now work. So now we can do the “squeue” command and
see what happens with it. So that was now 5034695. So that’s now running and we can see that
because of the capital R here and before we saw PD and that was for “pending”. It’s been running for 8 seconds. I can do things like “wwall -j” and again
with the job ID. Not
sure why that happened. I think that would usually work, I’m not quite
sure… Did I get the job ID correct? Yeah. I’m not quite sure why that’s not working,
that should in general work. I can also do things like “squeue -u” and
then my username and this should show all of the jobs that I have. Oh I see, OK. This should show all of the jobs that I are
running. At the moment, nothing is running and that
explains why “wwall” didn’t show anything because in the time that I was monkeying around
trying to type in the commands the job started, finished, and there was no job running. So this warning is not very helpful and obviously
it didn’t clue me into what was going one but “No nodes selected” is an indication that
the job wasn’t actually using any nodes because it wasn’t actually running. So let’s see if that job actually produced
any output. So this “calc.out” was the output file that
I created and if I look at that we can see that it produced whatever numerical output
I asked it to produce, so it did actually run. For the sake of illustration I’m going to
run the job again and this time I’m going to hopefully do “wwall” quickly enough that
we can see what it shows. So you can see that it’s basically showing
the total CPU utilization here and also the amount of memory being used. The “savio3” nodes it looks like have about
96 GB of memory and that’s what’s being shown here, and my job is only using about 3 GB,
so it’s not using very much memory which is good. If you start to see this getting up to like
80 GB or 85GB or 90 GB that would be an indication that your job might run out of memory and
it might get killed because it ran out of memory, so this is a good way to monitor that. The other thing that’s quite useful here is
monitoring the percentage of usage here. If you are running a Savio job on almost all
of the partitions except on the HTC partition that I’ll mention in a little bit, you have
access to the entire node and you’re charged for the entire node. So what that means is that in general you
don’t want to be running a job where you’re only using one core for the entire job because
you’re basically wasting all of the other cores. If you need to do it, you can do it, but you
would be charged for it. So this is an indication that I’m only using
one of the cores, one of the CPUs on this particular node because 3% of 32 cores is
basically 1 core or 1 out of 32. So ideally what you would see here is 100%
or a high number near 100%. So this is an indication that my code was
really only running basically serially, it wasn’t using any parallelization. And I’ll talk in a little bit about some strategies
for how you could try and make use of all of the cores on a node. Any questions?>>AUDIENCE MEMBER: Is there a version of
“wwall” where you can monitor these numbers and percentages in over time as they change?>>CHRIS: There is a command that we’ll see
in a second that will tell you after a job has run what the result is. You can do things like the “watch” command
if you did “watch -n 1 wwall -j” and the job ID then “watch” would basically just refresh
the output and you could see it doing it in real time. Other questions? So let me go back to the slides partly just
to remind myself of where I am. Yes?>>AUDIENCE MEMBER: Is there any command to
list the running jobs?>>CHRIS: Yes, so the “squeue” command would
show all of your jobs either pending or running. So I would do “-u paciorek” to see mine or
if I wanted to see all of them I can just type “squeue” and that will now list out all
of the various jobs that are running and pending on Savio. So this would be a way to get a sense of what’s
running and how much is running. Other questions? There are a few other things that you can
do here in terms of looking at how much resources were used on a job that has already finished
and you can actually log in to a node that you’re running a job on and run top to be
able to see what’s happening on the node. So there are some commands for that here that
I won’t go through in detail. Let me say a little bit briefly about parallel
job submission. In addition to specifying the time limit and
the account and the partition that you want to run on you can also specify the amount
of what kind of resources you want. What I mean by that is how many nodes and
how many cores you want to use. If you want to use more than one node and
do parallelization across more than one node, you can specify the –nodes flag and you would
put this in in the same style as you would here. You have an SBATCH line where you put –nodes
for example. You can also say how many processes you want
to run per node. This would often be used if you were using
MPI. And there the number of MPI processes corresponds
to this word of “tasks” here. You can also say how many CPUs you want to
use per task. So depending on what you’re doing the notion
of what a task is may change but whats going to happen when you do this is this is going
to determine how many nodes SLURM assigns to you and it will also result in setting
some environment variables inside your job that your code or other code that other people
have written can use to understand how many cores are available and how many nodes are
available. So that will all be set based on having passed
these flags in here. So here’s an example of that if you were running
a job that uses MPI. One of the main ways that you would use multiple
nodes on Savio is if you’re either writing your own MPI code or you’re using some software
where someone else has set up that software to use MPI and MPI is a protocol to pass in
information and data from one node to another. If you’re going to parallelize across multiple
nodes you need to get information across multiple nodes, and MPI is the way that you do that. Here’s an expanded job script. This is just like the job.sh file that I showed
you before except now it’s basically set up to run some MPI code in parallel. I’ve added this flag here –ntasks=40 and
this would naturally be done on “savio1” because “savio1” has 20 cores per node so this would
naturally amount to having 20 cores per node on two separate nodes for a total of 40 processes
or cores. So if you set this up this way, when you then
load in openmpi and you run a job via MPI, MPI will actually communicate with SLURM and
will realize that you’ve asked for 40 tasks and what that means is that it will start
up 40 processes to do the parallelization, and MPI will then be managing the communication
between those 40 processes. If you’re not familiar with MPI and that didn’t
mean that much to you, that’s OK. But for those of you who know something about
MPI hopefully that was helpful. Yes?>>AUDIENCE MEMBER: That’s one core per task?>>CHRIS: Yeah, you don’t really need to set
the CPUs per task in this case, the default is actually one and there are ways that you
can combine MPI with also using threading so you could have tasks that use more than
one core. But in general you don’t want to overload
your cores. You want to have each core running at 100%
of CPU so either you’d have one CPU per task and as many tasks as you have cores or you’d
want to have fewer tasks than cores and have multiple cores per task so there’s some arithmetic
you can work out to think about that. The common sorts of things that you might
end up with is what I just said: an MPI job with one CPU per MPI task, an openMP or threaded
job on one node where you have c multiple CPUs per task, or you could do sort of a hybrid
thing. There’s lots of variations on this. We have some guidance here in this link on
submitting jobs in these different flavors and we’re happy to take questions on this
sort of thing like “What’s the best way to set up parallelizing with some particular
software that I’m using?” But one thing to keep in mind is that in general
it doesn’t make sense to ask for more than one node unless you know that the software
that you’re running is capable of being run on multiple nodes. Some software is but lots of software is not. I’ll say a little bit about interactive jobs. In addition to using SBATCH which will submit
a batch or background job, you can also actually run jobs on the compute nodes interactively
where you’re just typing away and you can see what’s happening on the fly. That’s a good way to prototype your code or
if you need to do something where you really need to interact with the system. The same rule applies that you’re going to
get the entire node so if you’re doing something interactive where you’re not doing it parallel
in some way, you’re still getting charged for all of those CPUs even if you’re just
using one of them. Let me go back here and we’ll briefly see
what it looks like to run interactively. You would basically execute code that looks
like this and now what we’re doing is running srun and we pass those flags that I put in
the job script directly from the command line. In this case, I’m again going to use the reservation,
so I’ll change this to “savio3” and I’ll change this to “ac_scsguest” and I’ll change this
to “–reservation=savio3-testing”. You wouldn’t have to put the reservation in. You often wouldn’t have to put the account
in if you just have one account or want to use your default account. You would generally need the partition and
if you’re doing more than one node you’d want to put in the number of nodes. In this case, I don’t actually need that although
I could leave it in there. The requested reservation is invalid. Was it called something else? Oh, “savio3-training” not “-testing”. You’ll notice that my prompt here says that
I’m on login node 2. Once I hit this and I’m allocated a resource
the main clue that you have that you’re actually running a job interactively on the compute
node is that your prompt will change and it will show you the name of the node that you’re
running on. So in this case I’m running on node 0069 dot
“savio3”. And if I run “hostname” it should actually
confirm that that’s the node that I’m on. At this point I can do whatever I want. I can do module load MATLAB or Python. I could run that Python calc.py script that
I ran. I can do whatever I want. Questions?>>AUDIENCE MEMBER: For instance if you have
a job and you check the memory and it’s over 80% then that means that you should be using
two nodes or parallelization?>>CHRIS: No, that’s a great question and
I’m glad you asked it. It means that you should either a) try to
see if you can reduce the amount of memory your code is using or b) use one of the big
memory nodes or c) the only time that you can actually take advantage of the memory
on multiple nodes is if the code that you’re running can actually run on multiple noes. There’s no way to share memory across nodes
while you have a job running on one of the nodes.>>AUDIENCE MEMBER: What if your job consists
of multiple small jobs?>>CHRIS: Yeah, I’m going to get to that in
a second. Other questions? Same thing on the interactive nodes. You’re still charged for the entire node.>>AUDIENCE MEMBER: Are nodes shared?>>CHRIS: No. If you’re running a job on a node you get
access to the entire node and nobody else can do anything on that node. Which in some ways is good because that means
you’re not competing for memory on the job. If you want to run stuff with a graphical
interface we have a visualization node that allows you to do that and there are ways to
connect the visualization node to a compute node to run things like Python or RStudio. So there’s some information here and that’s
also something we can help with on a one-on-one basis. I don’t think I’ll spend a lot of time on
the low priority queue. I mentioned it before. If you do have access to a Condo and you want
to run a very large job or you want to run a job that uses more nodes than you purchased
or you want to run on nodes of the type that you didn’t purchase you can run in low priority
mode. This is a good way to run a very big job that
you wouldn’t mind having the job killed. You won’t get charged for that job, but the
cost is that the job could be killed and you might have to resubmit it. If you notice that many nodes are free on
Savio then you could decide that’s a good time to run a low priority job and hope that
it doesn’t get killed. These kinds of jobs are also jobs where you
would hopefully want to be able to save your output on an interim basis and restart your
job from where ever it got killed. If you can set up a job to do that that also
works nicely with the low priority queue. Let me go back to this question of whether
you get access to the entire node or not. The two partition which are allocated on a
per-core basis are “savio2-htc” and the “savio-gpu” partitions. On those partitions you ask for one or two
or three or four cores (CPUs) and you would get access to those. Here’s an example. this is an interactive job but you could do
this with SBATCH as well. I want to use the “savio2-htc” partition and
then I say I want CPUs per task to be 2, meaning I want to get 2 CPUs that I can run my code
on. So I would ask for 2 CPUs if I had set up
my code such that it could use 2 CPUs. If it was just a serial job that could only
use one core I would change this to be 1 here. So that is a good way on jobs that you can’t
naturally parallelize, you can use these cores. These are charged per core as well, so you
won’t be charged more than you’re using. Same thing holds for the GPUs. You’d ask for how many GPUs you want, and
we ask that you request twice as many CPUs as the number of GPUs and then that’s what
you’re then charged for. Any questions there? Let me say a little bit partly in response
to the gentleman’s question about what if I have a lot of smaller jobs where each job
can only use a single CPU. The tool that we recommend there is something
called ht helper which stands, I think, for high throughput helper. What HT Helper does is you give it a list
of the computational tasks that you want to run. So maybe you have 1000 things that you want
to compute and they can each be done independently. And each one of them often would only use
one core, but each one of them could if you want use two or three cores or something like
that. What you would do is you would write some
code that uses this HT Helper and you’d tell HT Helper how many of these task you need
to run and what the code is that needs to run to actually execute each task. And then you would write your SLURM job script
to call HT Helper with the particular invocation that’s given in this documentation. What HT Helper will do is it will say “Oh,
you gave me 1000 things to do and you started a job where you asked for two nodes and say
48 cores on “savio2″.” What HT Helper will then do is set up a little
scheduler inside of the SLURM scheduler and it will start up 48 of those 1000 tasks at
the start and when one of those 48 finishes it will start the 49th one and when another
one finishes it will start the 50th one and it will walk its way through all of the 1000
tasks. And the nice thing that will accomplish is
it will keep 48 cores busy until you get to the end of the computation, so you’ll effectively
be using the entire two nodes in the example I was giving.>>AUDIENCE MEMBER: Would that work if some
of the tasks are dependent on the others?>>CHRIS: No, it wouldn’t work if they’re
dependent on the others, so you really want these to be independent tasks. I suppose that if the tasks were far enough
apart that you knew one was going to finish before the other then it might work, but there
is no guarantee of what order of when things finish, so you’d generally want things to
be independent.>>AUDIENCE MEMBER: [INAUDIBLE]>>CHRIS: You would want the wall time to
be the total wall time for the 1000 tasks.>>AUDIENCE MEMBER: [INAUDIBLE]>>CHRIS: Suppose you got 500 finished and
you wrote output from the 500 you could just change your code to do the next 500 in the
next job. There are also lots of ways in tools like
Python and R and MATLAB where you can do something a little bit like what I just said. You’d write your R or Python or MATLAB code
to work in parallel and that will often work nicely across cores on one node and there
are tools for Python, R, and MATLAB that allow you to run across multiple nodes as well. So if you’re working with those tools then
you can also make use of multiple cores for independent computations. OK, we’re running a little low on time so
let me see what I want to focus on here. We’ve already covered most of this about checking
the status of jobs. This “sinfo” command is useful because it
will show you how many nodes are free on a given partition. So this would give you a sense of if you submit
something right now is it likely to start running soon or does it seems like the system
is really busy. So you’ll notice here this is “savio1”, there
are 81 nodes allocated, that are being used right now, and there are 7 that are idle. So this is an indication that there is some
free resource on “savio1” that you could make use of now. The other things I’ll mention is this check_usage. If you’re running under an FCA you have this
limit of 300,000 core hours and so if you want to know how much of the 300,000 core
hours or service units have been used during the year that we’re talking about you can
run check_usage. So if I did that for example, I think a better
example if there’s an FCA called “geissler” that I’ll look at. So this takes a little bit of time to do whatever
arithmetic it needs to do to compute how much usage has been made since June when the FCAs
were reallocated. OK it’s going a little more slowly than I
expected. OK, this says that under this FCA there have
been 87 jobs run since June 3rd of this year. They’ve used 21,000 CPU hours, and that’s
made up 26,000 service units out of the allocation of 300,000. So in this particular FCA they’ve been doing
some computation but they’re not all that close to the limit of what they could use. Any questions there? The last thing I’ll do is I’ll show you a
brief demo of running some parallel code in Python. And then we’ll quit. I’m going to do something similar to the sorts
of things that Christopher was showing. I’m going to copy a csv into my scratch directory
because that’s where I want to do data input and output. So I’m going to get my input data there. This pip install is just an example of when
you want to install your own Python package in your home directory, you can’t just use
pip install because it would try to install in the system directory and you don’t have
access to those directories. But you can use this “–user” flag and that
would then install it in your home directory. So in this case it says “statsmodels” is already
available so it’s not going to install it. But if this were a package that were not already
installed this would be how you would run that. Oh I forgot that I was still in my srun session
so I was still getting charged for that session. So if I want to get out of that and stop being
charged I would type exit. So now I’m back on the login node. Now at this point let’s pretend that I wanted
to run that Python code in an interactive session I could do that same thing. Again I’m going to change this to “savio3”
and change the account. And I’m going to put the reservation in. So
now I’m back on that same node and I now have access to two nodes. If I do “env” and look for SLURM this gives
all of the environmental variables in the shell that are being set by SLURM. You’ll notice, for example, that the number
of nodes is 2, so that’s an indication that I have an allocation of 2 nodes to do my computations
on. At this point if I’m running Python and I
want to run the iPython parallel tool to run across multiple nodes I need to start up Python
worker processes on all two of the nodes, one worker process per core that I have available. So that’s basically what this syntax here
is going to do. It’s going to load the modules I need and
then this “ipcontroller” and “ipengine” are going to start up a master controller process
and a whole bunch of worker processes. What srun does in this context which is different
from what srun does in this context up here is if I run srun within another srun or srun
within an sbatch, it will basically run this command here — whatever comes after the srun
— it will run that command once for each of the cores that I have access to. So this will start up 48 different Python
processes. So I can go ahead and do this. There are a bunch of sleep commands in here
because I need to pause and wait for the earlier commands to finish before I start up the later
commands. I probably actually need to make this sleep
for longer to give it enough time. But basically this is how you would start
up all these workers. Once you start up the workers you can then
start iPython and then in iPython there’s a whole bunch of code here that gives a demo
of running Python code in parallel. I think for the sake of time I won’t go through
exactly what this is doing. You’d might be familiar with iPython parallel
or you’d need to go look up the documentation and get a sense of how this works. But here’s at least a starting example of
some code that would work. If I for example do “ps aux | grep python”
I should see a whole bunch of Python workers running. There’s a list of a whole bunch of Python
processes and that’s an indication those worker processes have all started up and I should
be able to have access to them. In this case I’ll just exit out rather than
fully completing the demo here. That was fairly abbreviated, but are there
any questions at a high level about what I was doing there? I’ll just mention a few things just to wrap
up here. One is that there is a nice package in Python
called Dask that does a really nice job of managing parallelization across the cores
on one node or across multiple nodes. I didn’t put any documentation here for Dask
but we did give a training last Spring on that and there’s a link to the training here. And I’ve also prepared a tutorial on using
Dask and that’s here and there’s tons of resources online over and above that. But if you’re somebody who uses Python or
is interested in getting started in using Python and you’re interested in doing stuff
in parallel I would recommend making use of Dask. The other nice thing about Dask is that if
you have a very large dataset, for example one that wouldn’t fit in memory on one node,
going back to your question, you could use Dask to operate on a single data object that
gets split up across multiple nodes. You’ll use the memory of those multiple nodes
and you can also then use multiple CPUs to compute on that very large dataset. So Dask makes that fairly easy to do. So let’s wrap up with a little bit of more
information on how to get help. If you have a Savio-specific question: “How
do I do something?” “This isn’t working.” “Something seems wrong on the system.” Any of those things, just send an email to
this address and one of the systems administrators or one of the consultants will respond to
that and will try and help you. If you have a more general computing question,
for example, BRC can help you get access not only to Savio but to resources more broadly
on campus or in the cloud or to NSF resources across the country you can email us at [email protected]
and one of the consultants will get back to you. We also hold office hours so you can just
drop in if you have questions. And then we also do consulting around data
management if you have, for example, confidential data, health data, educational data that needs
to be protected. We can help you with that in terms of how
to protect it in terms of storage and also how to do computations with it. You can email [email protected] and
we hold the same office hours both for computing and for data. So you can come for either of those sorts
of questions. And then I’ll just mention a few things that
are coming up. We have an open house and training on working
with secure data including working with secure data on Savio which is now possible. And that’ll be in about a month here at 3pm
on Tuesday. We’re usually looking to hire additional consultants
to help people like you. So if you’re a graduate student and this sounds
interesting to you, we have some fliers in the back or you could just ask us if you have
questions about this. Is Amy back there? Amy I forget the third thing that you mentioned
that I didn’t get to put on here.>>AMY NEESER: The Cloud Meetup. Anyone who is interested in cloud technologies,
we hold a monthly meetup at the Skydeck, always the 4th Tuesday.>>CHRIS: What’s the Skydeck? That sounds very futuristic but people might
not know what it is.>>AMY: It’s in downtown Berkeley and it’s
a campus startup incubator. So you’re welcome to come and we have pizza
to.>>CHRIS: So are there any questions before
we break?>>AUDIENCE MEMBER: [INAUDIBLE]>>CHRIS: Ya, there’s a slide that I skipped
over for the sake of time where you can use JupyterHub to access notebooks or you can
go through the visualization node. The visualization node is a little bit more
robust in terms of if there is an error you see what’s happening more easily but those
are the two ways.>>AUDIENCE MEMBER: [INAUDIBLE]>>CHRIS: If you’re just running a low-computational
burden job you can just run it on the Jupyter node and that won’t get charged anything. If you’re running something in parallel then
you’d run it on a compute node and you’d get charged.>>AUDIENCE MEMBER: [INAUDIBLE]>>CHRIS: You’d use the interactive node if
for example you wanted to bring up an RStudio session or a graphical interface to a Jupyter
notebook or a MATLAB interface and be able to point and click and those sorts of things. So let’s break here.