In R, search a dataframe for specific values, if found then select entire values and paste into new dataframe

Question

In R, I am trying to search a column in a dataframe for a specific value. Once that value is found, I want to select the all of the data where that value exists, then copy and paste it into a new dataframe. I have a sample of the dataframe below. I am searching for a value "Storms" in the Storms column. If multiple "Storm" periods occur in a row, I want the program to select all of the data for that particular event and paste it into a new dataframe. The end goal is to get individual dataframes for each storm event. I am new in R so I don't have much to start with. I am planning on using the grep to search. Any help is appreciated.

DateTime    Rain (in)   Discharge (m^3/s)   TP (ug/s)   Storm
6/9/2014 0:00   0   0.105895833 9135.989427 No
6/9/2014 12:00  0   0.1055  9089.171776 No
6/10/2014 0:00  0   0.101979167 8726.453115 No
6/10/2014 12:00 0   0.095770833 8058.577397 No
6/11/2014 0:00  0   0.09625 8115.980183 Storm
6/11/2014 12:00 0   0.093625    7838.081703 No
6/12/2014 0:00  0   0.095   7989.704734 Storm
6/12/2014 12:00 0   0.093666667 7842.683567 No
6/13/2014 0:00  0   0.09375 7850.2027   Storm
6/13/2014 12:00 0   0.094125    7891.157636 Storm
6/14/2014 0:00  0   0.099458333 8456.241045 Storm
6/14/2014 12:00 0   0.101104167 8622.904117 Storm
6/15/2014 0:00  0   0.095229167 8007.199433 No
6/15/2014 12:00 0   0.091958333 7665.64586  No
6/16/2014 0:00  0   0.091979167 7665.606679 No
6/16/2014 12:00 0   0.088833333 7355.67491  No
6/17/2014 0:00  0   0.094333333 7918.867602 Storm
6/17/2014 12:00 0   0.120229167 10713.34331 Storm
6/18/2014 0:00  0   0.133854167 12259.29537 Storm
6/18/2014 12:00 0   0.1195625   10640.74235 No
6/19/2014 0:00  0   0.118395833 10504.82994 No
6/19/2014 12:00 0   0.116729167 10316.9778  No

File<-read.csv(file="C:\\R_Files\\P_Weather_12hr.csv",header=TRUE)
for (event in File){
  search<-ifelse(grep('Storm',File$Storm),select all of the data...
}

score 1 · Accepted Answer · edited May 23 '17 at 12:22

You need rle (run-length encoding; see ?rle) to find values repeated in sequential rows. You can make an ID for storm/not storm periods with

df$StormID <- rep(seq_along(rle(df$Storm)$length), rle(df$Storm)$length)

or the convenient data.table::rleid function which does the same thing

df$StormID <- data.table::rleid(df$Storm)

Now you can subset to get only the storms, which are indexed by StormID:

df[df$Storm == 'Storm',]
#    Rain  Discharge        TP Storm            DateTime StormID
# 5     0 0.09625000  8115.980 Storm 2014-06-11 00:00:00       2
# 7     0 0.09500000  7989.705 Storm 2014-06-12 00:00:00       4
# 9     0 0.09375000  7850.203 Storm 2014-06-13 00:00:00       6
# 10    0 0.09412500  7891.158 Storm 2014-06-13 12:00:00       6
# 11    0 0.09945833  8456.241 Storm 2014-06-14 00:00:00       6
# 12    0 0.10110417  8622.904 Storm 2014-06-14 12:00:00       6
# 17    0 0.09433333  7918.868 Storm 2014-06-17 00:00:00       8
# 18    0 0.12022917 10713.343 Storm 2014-06-17 12:00:00       8
# 19    0 0.13385417 12259.295 Storm 2014-06-18 00:00:00       8

which is probably the most useful format for the information. If you really want each in its own data.frame, you can split it into a list of data.frames, which is better than individual ones.

split(df[df$Storm == 'Storm',], df[df$Storm == 'Storm', 'StormID'])
# $`2`
#   Rain Discharge      TP Storm   DateTime StormID
# 5    0   0.09625 8115.98 Storm 2014-06-11       2
# 
# $`4`
#   Rain Discharge       TP Storm   DateTime StormID
# 7    0     0.095 7989.705 Storm 2014-06-12       4
# 
# $`6`
#    Rain  Discharge       TP Storm            DateTime StormID
# 9     0 0.09375000 7850.203 Storm 2014-06-13 00:00:00       6
# 10    0 0.09412500 7891.158 Storm 2014-06-13 12:00:00       6
# 11    0 0.09945833 8456.241 Storm 2014-06-14 00:00:00       6
# 12    0 0.10110417 8622.904 Storm 2014-06-14 12:00:00       6
# ..    .        ...      ...   ...                 ...       .

Data

df <- structure(list(Rain = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Discharge = c(0.105895833, 
    0.1055, 0.101979167, 0.095770833, 0.09625, 0.093625, 0.095, 0.093666667, 
    0.09375, 0.094125, 0.099458333, 0.101104167, 0.095229167, 0.091958333, 
    0.091979167, 0.088833333, 0.094333333, 0.120229167, 0.133854167, 
    0.1195625, 0.118395833, 0.116729167), TP = c(9135.989427, 9089.171776, 
    8726.453115, 8058.577397, 8115.980183, 7838.081703, 7989.704734, 
    7842.683567, 7850.2027, 7891.157636, 8456.241045, 8622.904117, 
    8007.199433, 7665.64586, 7665.606679, 7355.67491, 7918.867602, 
    10713.34331, 12259.29537, 10640.74235, 10504.82994, 10316.9778
    ), Storm = c("No", "No", "No", "No", "Storm", "No", "Storm", 
    "No", "Storm", "Storm", "Storm", "Storm", "No", "No", "No", "No", 
    "Storm", "Storm", "Storm", "No", "No", "No"), DateTime = structure(c(1402272000, 
    1402315200, 1402358400, 1402401600, 1402444800, 1402488000, 1402531200, 
    1402574400, 1402617600, 1402660800, 1402704000, 1402747200, 1402790400, 
    1402833600, 1402876800, 1402920000, 1402963200, 1403006400, 1403049600, 
    1403092800, 1403136000, 1403179200), class = c("POSIXct", "POSIXt"
    ), tzone = "UTC"), StormID = c(1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 
    6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 9L, 9L, 9L)), .Names = c("Rain", 
    "Discharge", "TP", "Storm", "DateTime", "StormID"), row.names = c(NA, 
    -22L), class = "data.frame")

Thank you!! Do you know an easy way to write each of the files to individual csv files? — Koda, May 30 '16 at 16:07
You can loop across the list of data.frames with `lapply` and pass each to `write.csv`, though you'll need to construct unique file names: `lapply(split(df[df$Storm == 'Storm',], df[df$Storm == 'Storm', 'StormID']), function(x){write.csv(x, paste0('storm', unique(x$StormID), '.csv'))})` — alistaire, May 30 '16 at 18:04

In R, search a dataframe for specific values, if found then select entire values and paste into new dataframe

1 Answers1

Data