My data.table
consists of hourly observations of the power produced by an engine (output
) and a system state descriptor tag
which tells which all components of the engine are turned on.
DATA
structure(list(time = structure(c(1517245200, 1517247000, 1517248800,
1517250600, 1517252400, 1517254200, 1517256000, 1517257800, 1517259600,
1517261400, 1517263200, 1517265000, 1517266800, 1517268600, 1517270400,
1517272200, 1517274000, 1517275800, 1517277600, 1517279400, 1517281200,
1517283000, 1517284800, 1517286600), class = c("POSIXct", "POSIXt"
), tzone = ""), output1 = c(160.03310020928, 159.706274495615,
159.803834736236, 159.753928429527, 159.54807802046, 159.21298848298,
158.904290018581, 158.683643772917, 158.670475839199, 158.793901799427,
158.886487460894, 159.167829223303, 159.66751884913, 159.1288534448,
159.141463186901, 160.116892086363, 160.517879769862, 160.615925580417,
160.915687799509, 161.590897854561, 161.568455821241, 161.411642091721,
161.811137570257, 162.193040254917), tag1 = c("evap only", "evap only",
"fog & evap", "fog & evap", "evap only", "evap only", "evap only",
"neither fog nor evap", "neither fog nor evap", "fog & evap", "evap only", "evap only",
"evap only", "fog & evap", "evap only", "fog & evap", "evap only",
"evap only", "evap only", "evap only", "fog & evap", "fog & evap",
"bad data", "neither fog nor evap")), row.names = c(NA, -24L
), class = c("data.table", "data.frame"))
You can also generate some sample data using:
sample_data <- data.table(time = seq.POSIXt(from = Sys.time(), by = 60*60*3, length.out = 100),
output = runif(n = 100, min = 130, max = 172),
tag = sample(x = c('evap only', 'bad data', 'neither fog nor evap', 'fog and evap'),
size = 100, replace = T))
I want to group this by day (sample data above has only two days but actual data has 3 years worth of data) and find the mean power corresponding to each tag
. I would like the output to be something like:
time evap only fog & evap neither fog nor evap bad data
1: 2018-01-29 159.8391 160.0825 159.8491 161.8111
I've tried the following piece of code but the result is not in the form that I want. I'm using .SDcols
because the actual dataset has a large number of other columns.
sample_data[, lapply(.SD, function(z){mean(z, na.rm = T)}), .SDcols = c('output1'), by = .(round_date(time, 'day'), tag1)]
round_date tag1 output1
1: 2018-01-30 evap only 159.8391
2: 2018-01-30 fog & evap 160.0825
3: 2018-01-30 neither fog nor evap 159.8491
4: 2018-01-30 bad data 161.8111
I've seen the below questions posted on stack overflow.
- Create new data.table columns based on other columns
- Loop through data.table and create new columns basis some condition
- R data.table create new columns with standard names
- Add new columns to a data.table containing many variables
- Add multiple columns to R data.table in one function call?
- Assign multiple columns using := in data.table, by group
- Dynamically create new columns in data.table
- Creating new columns in data.table
Is there a data.table
way of achieving this?