SciCompWorkshop2007One

From MariachiWiki

Scientific Computing Workshop Summer 2007

at Stony Brook

July 16-20, 2007
Nuclear Structure Laboratory
Physics Department, Stony Brook University
8:30AM-4:30PM, daily


MARIACHI is an experiment that makes use of cutting edge computer technology to gather and analyze experimental cosmic ray data collected over a wide geographic area. As such, part of our goal is to provide participants with the computer skills they need to do their own analysis and research in collaboration with the MARIACHI team. In particular, we want to provide exposure to methods that allow researchers to utilize the latest scientific CyberInfrasture: distributed, grid computing.

The Scientific Computing Summer Workshop is designed for high school students who might want to perform grid-based scientific research, for their teachers who may want to teach these skills, and for undergraduates or graduate students who may be Windows experts but want to delve into UNIX and/or Grid computing. This workshop will be focussed on teaching very small number of core, concrete skills and providing a solid understanding of why things are the way they are, rather than simply explaining what to do to achieve a particular aim. At the end of the course, the participant should understand a (admittedly) very small slice of the areas we have covered, but be very well equipped build on this solid understanding and thus broaden their abilities. Each day will consist of lecture in the morning, and a hands-on, practical workshop in the afternoon.

Specifics


  • Audience: High school science students and teachers, university undergraduates and graduates. All should be comfortable working with computers, but no additional expertise necessary.
  • Hours: 8:30 - 4:30 daily. Total: 40 hours.
  • Location: Nuclear Structure Laboratory, Physics Department, Stony Brook University.
  • Cost: No cost to participants - supported by a grant from NSF.

Topics


  • Unix - Unix filesystems, programs, and methods.
  • Shell Scripting - Data manipulation with classic UNIX tools: cat, grep, awk, sed, etc.
  • Python Programming - More advanced data handling with simple Python scripts.
  • Grid Computing - How to harvest distributed computing resources using a Grid.
  • Statistics with R - Introduction to statistics and data analysis using the R statistics program.

Contact Information


  • For information: contact John Hover (jhover-at-bnl.gov) or Dr. Helio Takai via email (takai-at-bnl.gov)

Workshop Schedule


Monday: UNIX

  • Welcome - Introductory remarks and goals of the workshop, by John.
  • Command line - interfaces and ASCII text, theory and practice
  • Logins, users, groups, shells, files, folders, paths, symlinks,
  • Programs, stdin/out/err, environment, files, permissions
  • UNIX filesystem layout - what and why.
  • Basic systems administration - logging, cron, package management, installing software.
  • Using live cd, with USB-stored data
  • Using a text editor: vi/Emacs
  • Customizing .bashrc


Tuesday: Shell


  • Data - ASCII text, Comma Separated Values (CSV) files
  • Concepts - Regular expressions, simple stats
  • Commands - cat, grep, egrep, awk, sed, tr, sort, uniq, wc
  • Automating common operations.
  • Invent a data file format
  • Answer an arbitrary statistics question on the command line


Wednesday: Python

  • Basics - Comments & code, indentation, variables, assignment, expression, indexing
  • Types - numbers, lists, dictionaries
  • Logic - and, or, in, for
  • Program Flow - if, while, try, except, foreach
  • Data - command line arguments, standard input, input files, standard output, output files
  • Basic modules - os, sys, string, re,
  • Write a data analysis Python script to do something useful that would be beyond the capabilities of shell language.


Thursday: Grid

  • Grid Security - authentication, proxies, VOs,
  • Job definition and scripting - JDL, RSL
  • Data movement - gridftp, globus-url-copy
  • Job submission - globus-job-run, globus-job-submit
  • Take the data analysis program from day 3 and turn it into a grid job.
  • Submit it to our cluster.


Friday: R

  • Introduction to R.
  • Basic statistics.
  • R Scripting in a grid environment.
  • Write a R script and use it with a grid job.


Return to Main Page.