class: center, middle # HPC
??? Notes for the _first_ slide! --- # What is High Performance Computing ? * More than your laptop or desktop (or high powered workstation) * Hardware architecture that fits your problem and available tools * Allows you to answer questions you couldn't with a typical computer * Faster iteration * Clusters, but not just clusters * High memory, high core count, bigdata, machine learning, etc. --- # Computer Architecture ## Simple, stupid computer architecture diagram [inspired by Mark Richter](https://www.redhat.com/en/about/blog/authors/marc-richter) ``` | CPU | Memory | ------------------- | Network | Disk I/O | ``` * These are the key areas to focus on for bottlenecks --- # Performance Monitoring * htop (or top) * iostat * iftop
There are other tools, but these will get you the basic info you need and are installed on most systems by default --- # Terminology * HPC terminology can be confusing, sometimes the same words have different contextual meaning (e.g. threads) * When in doubt - ask ## Terms * Server related * Nodes / slave node / compute node * Head node * Data transfer node --- ## Terms * CPU related * Processors (a cpu chip) * Sockets (the slot on the motherboard a cpu connects to) * Cores (physical cpus embedded on a single chip) * Threads / hyper-threading (software 'trick' to let two processes share a core) * Cache (keeps data close to the CPU) * L2 (fastest / smallest / closest) * L3 (faster / small / close) * Memory related * RAM / Memory (fast / bigger / further away) * SWAP (not so fast / big / furthest - on disk) * NUMA (non uniform memory access) --- ## Terms * Disk related * Disk (HDD, SDD) * I/O * Network related * TCP/IP * RDMA * Infiniband / IB / Omnipath / Mellanox * Gigabit (building speed) --- # Terminology * Software related * Processes * Threads * Parallel * MPI * OpenMP * SPMD, MIMD * [Flynn's taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy) --- # Compute Resources at ISU ## Clusters * [hpc-class](http://www.hpc.iastate.edu/guides/classroom-hpc-cluster) * For classes, not research * 48 nodes * 2.0GHz, 16 cores, 64GB RAM * [condo2017](http://www.hpc.iastate.edu/guides/condo2017) * Primarily for sponsored research * 180 Nodes * 2.6GHz, 16 cores, 128GB RAM * New free-tier - 48 nodes * 2.0GHz, 12 cores, 64GB RAM -- * cyence * NSF MRI grant * cystorm * Being retired * lightning3 * Being retired --- # Compute Resources at ISU ## [Custom hardware](http://rit.las.iastate.edu/hardware) * BioCrunch (for multithreaded shared memory programs) * 2.4GHz, 80 threads, 768GB of RAM * BigRAM (for large memory needs like de-novo assembly) * 2.6GHz, 48 threads, 1.5TB of RAM * Speedy (for single threaded programs, like R) * 3.4GHz, 24 threads, 256GB of RAM * Speedy2 (for single threaded programs, like R) * 3.2GHz, 32 threads, 256GB of RAM * LASWIN01 (for Windows only software) * 2.6GHz, 24 threads, 64GB of RAM * Legion (for massively parallel applications) * 4 nodes * 1.3GHz, 272 threads, 386GB of RAM (each) --- #Compute Resources ## [Xsede](https://www.xsede.org/) * ISU researchers have access to the national supercompute centers (e.g. TACC, PSC) via Xsede * For problems that need to scale larger than our on-campus resources * Contact campus champoins: Jim Coyle or Andrew Severin ## Cloud (AWS, Azure, etc.) * Tempting introductory rates * Be cautious of putting in a credit card (charges can accumulate quickly) * Consider data transfer times and speed * Consult with IT before purchasing - we have special negotiated rates --- # UNIX for HPC at Iowa State * RedHat Enterprise Linux 7 (current is 7.3) * Powerful and high performance * Supported and stable * Common in industry -- * [Fedora](https://getfedora.org/) is the upstream desktop distribution (good option to become familiar with linux) * [CentOS](https://www.centos.org/) is the downstream server distribution * When you google linux questions, you'll inevitably get results for Ubuntu * results can point you in the right direction, but interpret and understand - don't just copy & paste * sudo apt-get install - will not work for you on our systems --- # Software ## Modules * We use modules to allow multiple people to use the same software with consistent, reproducible results * Modules keep your environment clean and free of conflicts (e.g. python2/3 or Java7/8) * Think of modules as a software library. You need to check out the software before you can use it. * We call our software library [RISA](https://researchit.las.iastate.edu/research-it-software-archive) (Research IT Software Archive), and sync it to hpc-class, condo, and all of our custom machines ``` $ module avail ------------------------------------------------------------------ /opt/rit/modules ------------------------------------------------------------------ abyss/1.5.2 freesurfer/5.3.0 lib/htslib/1.2.1 microbiomeutil/r20110519 rsem/1.2.22 abyss/1.9.0 gapcloser/1.12-r6 lib/htslib/1.3 migrate-n/3.6.11 rsem/1.2.31 afni/17.0.10 gapfiller/1-10 lib/htslib/1.3.2 mira/4.0.2 rtax/0.984 albert/20170105 gatk/3.4-46 lib/ICE/1.0.9 mirdeep/2.0.0.7 ruby/2.3.0 ... ``` * Look for keywords like 'MPI, parallel, OpenMP' * [Search](https://researchit.las.iastate.edu/available-software) for similar programs that may scale up better --- # Software ## Modules * To use a module ``` $ module load abyss ``` * Default behavior is to load the latest version if not specified * Use 'module purge' to clear your environment before loading something different --- # Storage ## Code * Git (github, bitbucket, gitlab, etc.) ## Datasets * Storage on the clusters and servers should be treated as temporary * Varying levels of speed and resiliency (generally faster is smaller, and more volatile) * Become familiar with the [disk architecture on each machine](https://researchit.las.iastate.edu/getting-best-io-performance-your-computational-jobs) * Use network storage such as my.files, box, github, etc. * Keep backups --- # Job Scheduler * Scheduler assigns jobs from the queue to the compute nodes * We use [SLURM](https://researchit.las.iastate.edu/slurm-basics) * Think about the scheduler like a hotel reservation - you're charged for the room whether you use it or not, and if you ask for an especially long or large reservation, the room may not be available when you want it. * Basic info required: how many nodes, how long * [Script writer](http://www.hpc.iastate.edu/guides/condo2017/slurm-job-script-writer) can get you started ## Sample SLURM script (mysbatch.sh) ``` #!/bin/bash #SBATCH --time=1:00:00 # walltime #SBATCH --nodes=2 # number of nodes in this job #SBATCH --ntasks-per-node=16 # total number of processor cores in this job #SBATCH --output=myout_%J.log #SBATCH --error=myerr_%J.err module load R Rscript MyThing.R ``` Then submit: ``` $ sbatch mysbatch.sh ``` --- # Common stumbling blocks * Over or under using resources * Not taking advantage of the right machine for the problem * Moving data through slow links * Trying to scale up programs that weren't designed for large datasets --- # Support * [Research IT](http://rit.las.iastate.edu) * Build software, run custom hardware, consult on performance improvements, etc. * Collaboration between LAS, CALS, IT, etc. http://rit.las.iastate.edu/people * researchit@iastate.edu * IRC (chat): #bitcom on freenode * [ISU IT HPC Team](http://hpc.iastate.edu) * Cluster Support team * hpc-help@iastate.edu