1. Overview¶
- HPCC cluster = many connected servers
- Head nodes (BlueJay / Skylark) = login machines
- Compute nodes = where jobs run
- Slurm = job scheduler
Workflow:
Local computer → SSH → Head Node → Slurm → Compute Node
2. Connect to Cluster (Windows)¶
Open PowerShell:
ssh -XY yourNetID@cluster.hpcc.ucr.edu
Enter in Password and verify account with Duo Push. After this succeeded, you will be in the Head Node (bluejar or skylark). The head node you will be in doesn't really matter.
To switch between head nodes:
ssh skylark
ssh bluejay
After being in the server, you can
- Check if an user has an account within the server:
id netID - Check the limit of slurm, this means how much CPU and memory each partition can use in parallel:
slurm_limits
4. Non-Interactive Job (Recommended)¶
Create fileName.sh:
This file contains requirements to run the code.
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
#SBATCH --time=1-00:15:00 # 1 day and 15 minutes
#SBATCH --mail-user=netID@ucr.edu
#SBATCH --mail-type=ALL
#SBATCH --job-name="give the job a name"
#SBATCH -p epyc # You can use any of the following; epyc, intel, batch, highmem, gpu
# Print current date
date
############################
# OPTION 1: PYTHON JOB
############################
# command to activate the conda environment that you created
module load anaconda
conda activate env_name
# command to run your Python script
python slurm.py
# command to convert if using Jupyter Notebook instead of only Python
# (only use this in Python jobs)
# jupyter nbconvert --execute "QSAR Model Tutorial_Chapter 2_Data Preparation.ipynb" --to notebook --inplace
############################
# OPTION 2: R JOB
############################
# If running an R job instead of Python,
# comment out the Python section above and use the following:
module purge # optional but recommended for clean R environment
module load R/4.5.0
# command to run your R script
Rscript fileName.R
To submit the job onto Slurm:
sbatch fileName.sh
Check status:
squeue --me
5. Upload & Check Files¶
Upload directory:
scp -r . yourNetID@cluster.hpcc.ucr.edu:~/folderName
# This creates a folder in your hpcc account and upload all local files of the directory you are in into the folder created in hpcc.
scp -r . yourNetID@cluster.hpcc.ucr.edu:~
# This uploads all files in the folder, but the items in the folder will not be inside of any directory when uploaded to HPCC server.
# Example:
# ParentFolder _____ File1
# |____ File2
# |____ ChildFolder1 _____ ChildFile1
# |____ ChildFile2
# On HPCC, it will look like this:
# File1 ____ File2 ____ ChildFolder1 _____ ChildFile1
# |____ ChildFile2
Upload single file without creating a folder:
scp -r fileName yourNetID@cluster.hpcc.ucr.edu:~
Deleting a created folder on HPCC:
rm -rf folderName
Navigating directories:
ls
# List all files and folders within the directory you are in, this can be used to check where all are uploaded.
cd folderName
# go to a folder
cd ..
# go outside of the folder you're currently in
# Example:
# After running the scp command to upload your "ParentFolder" into the HPCC server, and you are currently in the head node:
ls # list out all files in the folder named "ParentFolder"
cd ParentFolder # this navigates you into the folder called ParentFolder, check if File1, File2, ChildFolder1 are all uploaded by typing:
ls # list out all files within the "ParentFolder"
cd ChildFolder1 #to check if the ChildFile1 and ChildFile2 are uploaded
6. Environment Setup and Running Code¶
Creating an environment makes running a more complicated piece of code easier since you can download neccessary dependencies in the environment. Without an environment, there is still a default module for running codes, it might miss some modules you need.
Running File:
# Non-interactive:
sbatch fileName.sh
# Interactive:
srun -p epyc -c 8 --mem 8GB --pty bash -l
Find which R modules are available:
module avail | grep -i r
# Then use the version of R that's available.
module load R/4.5.0 #this is the version that's available on mine
Create Directory for R project/ Python project This is so that not everything will just be in home and the packages for both languages are unorganized.
mkdir ProjectName
cd ProjectName
PYTHON ENVIRONMENT¶
# Create environment
conda create -n envName python=3.10 # or whichever python suits your needs
# Activate environment
conda activate envName
# then your screen should show (envName) netID@r##:~/folderName, if:
# 1. The environment is sucessfully activated
# 2. You are in the computing node: "r##"
# 3. You are in the directory that all files for the program are in: "/folderName"
# Install packages
pip install sklearn torch pandas matplotlib
# and any other necesary packages
# only when inside of directory
R ENVIRONMENT¶
# Start R
R
# Create environment:
install.packages("renv") # renv is the recommended environment to use for R
# Activate/Initialize environment
renv::init()
# restart R after this step using
q()
# Re-enter R using the command "R"
# Install packages:
install.packages("tidyverse")
# and any other necesary packages
# only when inside of directory
7. Download Results¶
scp -r yourNetID@cluster.hpcc.ucr.edu:~/folderName .
The "." means download all.