R Analysis

From MariachiWiki

R Analysis Scripts

Dima Vavilov wrote a set of R scripts to facilitate efforts to bring data analysis to the classroom, and Mariachi R Analysis in general (download it here). The scripts help to get useful data in R (so that they can be analyzed in R) and record them in Excel readable format. Historical note: Gillian Winters brought Mariachi data analysis to her school classes, but found that massaging the original data into a usable form took a lot of time; these scripts are intended to make the data more accessible for analysis.

Technically it is one file that should be "sourced" by R (the archive for downloading also contains two sample data files and a README file):

>source("manalysis.R")

After that the following functions are available:

  • mavgcounts(infile,outfile="counts_avg.csv",interval=30) reads infile which can be either a counts-DATE.txt file kept at the Mariachi server and at school DAQ computers or a file saved from John's Web Analysis tool (when you request not a graph but a table). It then averages data over specified time interval (default is 30 min) and writes the result in a comma separated value (CSV) format to outfile suitable for importing to Excel. Besides the script defines two variables:
    • ct0 (counts table 0) unaveraged counts data (corresponds line to line to the infile records). E.g.:
> ct0[1:2,]
                 date   d1   d2   d3   d4   d5   c2 c3.1 c3.2 c3.3 c4.a c5
1 2008-03-03 05:00:17 8790 5330 8750 7720 8050 1866    2    0    1    0  0
2 2008-03-03 05:01:17 8770 5260 8610 7940 7930 1909    3    1    0    1  0
    • ct1 (counts table 1) averaged counts data. Can be used for further analysis in R:
> ct1[1:2,]
                 date       d1       d2   d3       d4       d5       c2     c3.1 c3.2 c3.3 c4.a        c5
1 2008-03-03 05:15:00 8689.667 5257.333 8694 8036.000 7970.333 1870.067 2.666667  1.8  2.0  1.4 0.5666667
2 2008-03-03 05:45:00 8639.667 5252.000 8678 8059.667 8259.000 1874.467 2.700000  2.0  2.6  1.5 0.6333333

Besides the script defines two variables:

    • wt0 (weather table 0) unaveraged weather data (corresponds line to line to the infile records, non numerical values such as wind direction, SSE, are removed).
    • wt1 (weather table 1) averaged weather data.
  • mtbcomb2(ct0,wt0,interval=30) averages and combines two tables (counts and weather data) so that each record of the produced table has both data. E.g. assuming that two previous function were called the following command:
>tab <- mtbcomb2(ct0,wt0)

can produce records:

> tab[1:2,]
                 date       d1       d2   d3       d4       d5       c2
1 2008-03-03 05:15:00 8689.667 5257.333 8694 8036.000 7970.333 1870.067
2 2008-03-03 05:45:00 8639.667 5252.000 8678 8059.667 8259.000 1874.467
      c3.1 c3.2 c3.3 c4.a        c5    TempF DewpointF  PressIn WindDirDeg
1 2.666667  1.8  2.0  1.4 0.5666667 24.13333  17.60000 30.31167        309
2 2.700000  2.0  2.6  1.5 0.6333333 22.68333  16.73333 30.31833        309
  WindSpMPH WindSpGustMPH     Humi HourlyPrecip dailyrain
1         0     0.0000000 75.83333            0         0
2         0     0.1666667 77.50000            0         0
  • mtbwrite(tab,file) writes tab to CSV file.
  • Other functions:
    • tab<-mreadcounts(infile) reads counts' infile produces corresponding unaveraged table tab.
    • tab<-mreadweather(infile) reads weather infile produces corresponding unaveraged table tab
    • tb1<-mtbcomb(tab,interval=30) produces averaged tb1 over intervals of time starting from unaverage tab


NOTES:

  1. All times are "internally" UTC (or GMT, for our purpose they are the same thing). R seems to like to convert them to local times occasionally.
  2. Averaged data has a mid interval as a timestamp.
  3. Syntax such as mavgcounts(infile,outfile="counts_avg.csv",interval=30) means that infile argument to the function should be specified, the other two have default values and can be skipped if you are happy with the default values, e.g. mavgcounts("counts-20080303.txt") means averaging over 30min intervals and writing to counts_avg.csv.
  4. The whole year of weather data (06/01/2007 - 06/02/2008) from the Mt.Sinai wether station can be taken here. The ZIP archive contains files per day and an "all-in-one" file. The files have no headers only records. The script reads these files all right.