AvgCntsJI

From MariachiWiki

Jason Immerman wrote this script (version 2 given here) that is used to take validated counts and weather data and average it over the given interval. The counts data files are split up into 11 columns: d1, d2, d3, d4, d5, c2, c3.1, c3.2, c3.3, c4, c5. The script first finds ranges for valid data, and then iterates through the counts data line by line validating the given counts line. At the end of every interval weather data is averaged and the interval data is output to a file.

Data Correction

All negative counts are removed, as well as counts of 0 for c2 and the five detectors. Following this the detector counts are left untouched. The remainder of the data is subject to several more corrections: value ranges, detector dependence and c2 dependence.

To determine value ranges (given in rate/minute) for which data is considered correct there are 2 processes, one for c2 and one for c3.1 through c5. In either case, there is an initial sweep through the file and histograms are built for the columns c2 through c5 for the rate per minute of counts. Here the data is untouched except that negative values are removed. For the c2 column, the 99th quantile is found in the histogram, and this value plus 300 is considered the upper end of the range. Then, the script sweeps from the 99th quantile to the left through each bin in the histogram until a bin is reached that is less than a given fraction, .004, of the maximum bin height. 300 less than this value is set as the lower end of the range. This invalidates spikes in the histogram due to time periods of calibration for the detectors. A good example of this is the array at Deerpark, where the array was uncalibrated for a time, leading to several spikes in the histogram below the operational rate, and then one larger spike that corresponds to the array that has been in operation since then:

                                     

To determine value ranges (given in rate/minute) for the remainder of the columns the lower limit is automatically 0 and the upper limit is determined as the first histogram bin whose value added to the net of those bins between itself and the bin corresponding to 0 exceeds 99.5% of the total as determine by a Poisson Distribution.

In terms of detector and c2 dependence, when a detector is 0 or c2 is counted as invalid for any reason, those columns relying upon that detector or c2 are counted as invalid. In this way, all data between c2 and c5 rely on detectors 1 and 2, while c3.1 and c5 rely directly on detector 3, etc. Finally, columns c3.1 through c5 all rely upon c2.

Using the Script

The program is entirely terminal operated. First, the executable must be made, using the included makefile. To do this, open the terminal and navigate to the folder containing the 3 files. Then type make and press enter. Following this, an executable version of the program will be located in that same folder. To see the help file execute the program with a first parameter of -h:

>./avgCounts -h

To run the program, you must have a counts file and a weather file (the dates of the weather file do not need to correspond, wherever data is not available the program reads in a -1 for weather data). The program takes a maximum of 6 parameters, although defaults are set for all 6. They are interval (in seconds), counts file, weather file, output file, histogram output file, silent option, with defaults of 7200, counts.txt, weather.csv, avg.csv, hist.root, not silent. The histogram output file should have a .root extension. In addition, filenames can be replaced by filepaths, but each parameter must be entered in the correct order. To enter the first five (allowing the silent option be turned off), call the program with:

>./avgCounts 1800 myCounts.txt myWeather.csv myAvg.csv myHist.root

However, some or all of the parameters can be left to the defaults. To leave the 2 output files as defaults:

>./avgCounts 1800 myCounts.txt myWeather.csv

The silent option allows the user to omit output to the screen (see Outputs). To enable this option run the program with a last parameter of -s:

>./avgCounts 1800 myCounts.txt myWeather.csv myAvg.csv myHist.root -s

Outputs

The script produces 4 different outputs, 2 to the terminal screen and 2 as files. To the screen are a report on the ranges determined and a diagnostics report. The ranges correspond to the enforced ranges found for each column, and the diagnostics report the total number of counts entries that were counted as invalid (note this is not the number of counts lines but instead individual entries, one line corresponding to 11 entries). In addition, the program reports the number of invalidated points due to each correction. However, the total of these may be more than the given total, as a c5 point that is invalid due to a detector count of 0, an invalid c2 count and itself being outside of the range will be counted as a invalid point by each correction.

The averaging output file is comma delimited. It should be saved with either a .txt or .csv extension and contains 21 columns: date and time of the middle of the recorded interval, d1, d2, d3, d4, d5, c2, c3.1, c3.2, c3.3, c4, c5, Temp (F), Dew Point (F), Pressure (inHg), Wind Direction (degrees), Wind Speed (mi/hr), Wind Gust (mi/hr), Humidity, Hourly Precipitation (inches) and Daily Rain (inches).

The histogram output file can be opened by ROOT using:

>root [] TFile f("hist.root")
>root [] .ls

The file contains 2 histograms for each column of the counts file, one without the range imposed and one with the range imposed (hist# and histRange# respectively). A copy of the histograms for deer park are shown above. For example, to draw a histogram for c2, which is column 6, without the range imposed:

>root [] hist6->Draw()