0

I am trying to determine the ability of 10 different instruments in responding to events. The data for one of the instruments is here. Also in the above link (an excel file called "peak times.xlsx") lists the events that occurred throughout the month. As can be seen from the above linked, I have the data in 30 different .csv files for each day tested. What I am trying to do is to take the known peaks and subset my data to eventually determine the time it takes from when the event started to the peak PM2.5 value in order to compare instrument response times. While I am still trying to determine the best way to do this, I first had questions about how to efficiently subset my known events. The problem for this is that the events I am trying to subset are at different time intervals, and the clock on the instrument did not record exactly at the intervals listed (see below for example)

    AirQualityEgg_HOMEChemKitchen_20180605_R0.csv.csv$Timestamp[c(169:180)]
    [1] 06/05/2018 02:48:03 06/05/2018 02:49:02 06/05/2018 02:50:02 
    06/05/2018 02:51:02
    [5] 06/05/2018 02:52:02 06/05/2018 02:53:02 06/05/2018 02:54:02 
    06/05/2018 02:55:02
    [9] 06/05/2018 02:56:02 06/05/2018 02:57:02 06/05/2018 02:58:02 
    06/05/2018 02:59:03

Because the time intervals within the data are not even and the events are not even, the only way I could determine how to subset my data by event was to do it manually. Note: I need to keep the time recorded as is because the goal is to determine how well the instrument performs to when the event occurs, and the clock on the instrument is part of the assessment.

Below is what I have done to subset so far. Because of the number of events as well as the number of instruments I have, it ended up being over 600 lines of code. The other instruments I have record on different time scales (some 1 second intervals, others 80 second intervals, etc.) Is there a way to efficiently subset the data by events so that I can use the subsetted data to then assess peak response times?

##import libraries
library(readxl)
library(readr)
library(data.table)
#find working directory
setwd()
getwd()
#import csvs
temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))
#AQE Kitchen data
#stove
AQE_K_Stove_A<- 
AirQualityEgg_HOMEChemKitchen_20180606_R0.csv.csv[c(511:527),]
AQE_K_Stove_B<- 
AirQualityEgg_HOMEChemKitchen_20180606_R0.csv.csv[c(720:759),]
AQE_K_Stove_C<- 
AirQualityEgg_HOMEChemKitchen_20180606_R0.csv.csv[c(934:961),]
.
.
.
#window
AQE_K_Window_A<- 
AirQualityEgg_HOMEChemKitchen_20180604_R0.csv.csv[c(485:501),]
AQE_K_Window_B<- 
AirQualityEgg_HOMEChemKitchen_20180604_R0.csv.csv[c(605:636),]
AQE_K_Window_C<- 
AirQualityEgg_HOMEChemKitchen_20180604_R0.csv.csv[c(725:755),]
.
.
.
#Lasagna
AQE_K_Las_A<- 
AirQualityEgg_HOMEChemKitchen_20180608_R0.csv.csv[c(932:1002),]
#Toast
AQE_K_Toast_A<- 
AirQualityEgg_HOMEChemKitchen_20180608_R0.csv.csv[c(525:531),]
AQE_K_Toast_B<- 
AirQualityEgg_HOMEChemKitchen_20180608_R0.csv.csv[c(901:905),]
AQE_K_Toast_C<- 
AirQualityEgg_HOMEChemKitchen_20180618_R0.csv.csv[c(570:577),]
.
.
.

Eventually I would like to generate a table with the delta t for each event for each instrument, then average them by event type. But first, I just want to see if there is a more efficient way to do what I have already done in order to prepare for the upcoming analysis of these events.

Edit: I hard coded the events by taking the time the event occurred (to the minute) and points in the data.table where that matched the time listed. I would extend the end time by a couple of points to ensure that I captured the peak. The data can be found in the above file "peak

Time of stove on to stove off:
A- 6/6 8:35 - 8:51 Stir fry (cast iron pan)
B- 6/6 12:05 - 12:43 Stir fry  (hot plate, cast iron pan)
C- 6/6 3:39 - 4:05 Stir fry (cast iron pan)
D- 6/6 9:05 - 9:21 Stir fry (cast iron pan)
E- 6/8 8:35 - 8:50 Breakfast (cast iron pan)
F- 6/8 11:35 - 11:56 Stir fry (cast iron pan)
G- 6/12 8:35 - 9:25 Stir fry (hot plate, wok)
H- 6/12 12:35 - 1:21 Stir fry (cast iron pan)
I- 6/12 4:35 - 5:16 Stir fry (wok)
J- 6/12 8:35 - 9:14 Stir fry (wok)
.
.
.
Time of windows open:
A- 6/4 8:05- 8:20 am
B- 6/4 10:05- 10:35 am
C- 6/4 12:05 - 12:35pm
D- 6/4 2:05- 2:35pm
E- 6/4 4:05- 4:35 pm 
.
.
.
Lasagna
A- 6/08 3:37-4:46 pm

Toast
A-6/08 8:50-8:54 am
B- 6/08 3:06 - 3:08 pm
C- 6/18 9:35 -9:40 am
.
.
.

There are in total 23 cooking events, 29 window events, 1 lasagna and 11 toast events.

johnnyg
  • 129
  • 8
  • 1
    You hard code row numbers. How did you determine those very specific ranges? Please show data in body of question (for future readers should links break) or include a `dput` of a few sample rows: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Parfait Feb 14 '19 at 20:38
  • Thank you, I added a few examples of the events I used to hard code my subsetted data. – johnnyg Feb 20 '19 at 16:04

0 Answers0