Useful Links about Large-Scale Network Analysis

MapReduce and Hadoop


Useful Links about Large-Scale Network Analysis

Supercomputing course (Feb. 17th 2015)

2015-02-17 05.43.02 pm

Basic Information

  • For further courses, sign up in the (Website)
  • (Also can download ppt files for today’s class.)
  • Karst
    • available to use condo nodes
  • node: CPU + GPU
  • queuse:
    • Submit jobs
      1. Put together the commands to want to run
      2. Add a few PBS commands to go before the actual commands
      3. Submit the job
    • Available software
    • Contact information


  • Simple workflow: input instruction – read, compute, write data in parallel – archive results
  • Home Directory: backup/ used always
    • 100GB: command ‘quota -s’ to see current usage
    • snapshots
      • howrly and nightly snapshots are made in home dir
      • are in a hidden .snapshot
    • shared acrross Big Red2, Mason, Quarry (100GB for all three)
    • the place to store sourse, command file
    • don’t compute in home directory
  • Data Capacitor II (DC2): very faset, not backed up / workplace
    • store input and output application data
    • not backed up
    • lustre file system: linux + cluster
    • scratch directories
      • path: /N/dc2/scratch/username/
      • temporary, files not accessed in 60 days may be purged
      • use as a scratch paper
    • project directories
      • files not accessed in 180 days may be purged
      • by application for users, groupas, or labs with special needs
      • possible to share data among users within a group
    • protect your data (find the commands related to HIPAA and ePHI in the ppt file)
    • “#PBS -l nodes=2:ppn=32:dc2” (don’t forget naming ‘dc2’ in the end)
  • Scholarly Data Archive (SDA)
    • best users
      • files of at least 1MB (up to 10TB)
      • data stored on the SDA can be kept for a long time
      • create a manifest or annotation of the data and keep it at the top of your storage directory, and keep it up to date
      • putting your data into the SDA first

Statistical and Mathematical Software on HPC Systems

  • Three packages on Big Red II : SAS, R, Matlab
  • “~> module load” + module name
  • R
    • basic
      • module load R; R
      • help(sin)
      • system(rm *) : for using shell script
    • multicore versions : “library(parallel)” and use the library

Visualiztion on HPC Resources

  • Applications and Libraries
    • Feh, ImageMagick for higher level
    • VTK, PCL(Eigen, FLANN), OpenCV, Mesa, FLTK, ImLib2 for lower level

Hands on session

  • couldn’t join since not making an account;;
Supercomputing course (Feb. 17th 2015)