0

I have made measurements of temperature in a high time resolution of 10 minutes on different urban Tree species, whose reactions should be compared. Therefore I am researching especially periods of heat. The Task that I fail to do on my Dataset is to choose complete days from a maximum value. E.G. Days where there is one measurement above 30 °C should be subsetted from my Dataframe completely. Below you find a reproducible example that should illustrate my problem:

In my Measurings Dataframe I have calculated a column indicating wether the individual Measurement is above or below 30°C. I wanted to use that column to tell other functions wether they should pick a day or not to produce a New Dataframe. When anytime of the day the value is above 30 ° C i want to include it by Date from 00:00 to 23:59 in that New Dataframe for further analyses.

start <- as.POSIXct("2018-05-18 00:00", tz = "CET")
tseq <- seq(from = start, length.out = 1000, by = "hours")

Measurings <- data.frame(
  Time = tseq,
  Temp = sample(20:35,1000, replace = TRUE),
  Variable1 = sample(1:200,1000, replace = TRUE),
  Variable2 = sample(300:800,1000, replace = TRUE)
)

Measurings$heat30 <- ifelse(Measurings$Temp > 30,"heat", "normal")

Measurings$otheroption30 <- ifelse(Measurings$Temp > 30,"1", "0")

The example is yielding a Dataframe analog to the structure of my Data:

head(Measurings)

                 Time Temp Variable1 Variable2 heat30 otheroption30
1 2018-05-18 00:00:00   28        56       377 normal             0
2 2018-05-18 01:00:00   23        65       408 normal             0
3 2018-05-18 02:00:00   29        78       324 normal             0
4 2018-05-18 03:00:00   24       157       432 normal             0
5 2018-05-18 04:00:00   32       129       794   heat             1
6 2018-05-18 05:00:00   25        27       574 normal             0

So how do I subset to get a New Dataframe where all the days are taken where at least one entry is indicated as "heat"?

I know that for example dplyr:filter could filter the individual entries (row 5 in the head of the example). But how could I tell to take all the day 2018-05-18?

I am quite new to analyzing Data with R so I would appreciate any suggestions on a working solution to my problem. dplyris what I have been using for quite some tasks, but I am open to whatever works.

Thanks a lot, Konrad

Konrad Bauer
  • 91
  • 1
  • 1
  • 11

2 Answers2

0

Below is one possible solution using the dataset provided in the question. Please note that this is not a great example as all days will probably include at least one observation marked as over 30 °C (i.e. there will be no days to filter out in this dataset but the code should do the job with the actual one).

# import packages
library(dplyr)
library(stringr)

# break the time stamp into Day and Hour
time_df <- as_data_frame(str_split(Measurings$Time, " ", simplify = T))

# name the columns
names(time_df) <- c("Day", "Hour")

# create a new measurement data frame with separate Day and Hour columns
new_measurings_df <- bind_cols(time_df, Measurings[-1])

# form the new data frame by filtering the days marked as heat
new_df <- new_measurings_df %>%
  filter(Day %in% new_measurings_df$Day[new_measurings_df$heat30 == "heat"])

To be more precise, you are creating a random sample of 1000 observations varying between 20 to 35 for temperature across 40 days. As a result, it is very likely that every single day will have at least one observation marked as over 30 °C in your example. Additionally, it is always a good practice to set seed to ensure reproducibility.

OzanStats
  • 2,756
  • 1
  • 13
  • 26
  • This one worked for my problem except, that I have difficulties to merge again the Day and Hour column to continue working with my POSIXct Datetime, which I need for further analyses. I will set seed next Time and try to match my Data better. – Konrad Bauer Jul 23 '18 at 16:05
  • @Konrad Bauer, you can keep the POSIXct timestamp with a small modification: `bind_cols(time_df, Measurings)` – OzanStats Jul 23 '18 at 17:36
0

Create variable which specify which day (droping hours, minutes etc.). Iterate over unique dates and take only such subsets which in heat30 contains "heat" at least once:

Measurings <- Measurings %>% mutate(Time2 = format(Time, "%Y-%m-%d"))

res <- NULL
newdf <- lapply(unique(Measurings$Time2), function(x){

  ss <- Measurings %>% filter(Time2 == x) %>% select(heat30) %>% pull(heat30) # take heat30 vector
  rr <- Measurings %>% filter(Time2 == x) # select date x

  # check if heat30 vector contains heat value at least once, if so bind that subset 
  if(any(ss == "heat")){
    res <- rbind(res, rr)
  }
  return(res)

}) %>% bind_rows()
Aleksandr
  • 1,814
  • 11
  • 19
  • This one worked fine for me, but I think I don´t understand it completely. Which kind of objects are "ss" and "rr" and how do they work? – Konrad Bauer Jul 23 '18 at 16:08
  • ss is a vector of values including "heat" values. If there are at least one "heat" value in ss, then we append rr to res dataframe (with all dates as you specified), otherwise, continue to next date and repeat the procedure. – Aleksandr Jul 23 '18 at 16:12