2

I'm not sure how to ask this one unfortunately but I think there has got to be easier solution than the one I came up with.

Below I have df1 which has daily data measuring variable x. There is also df2 which is annual with a column referring to the day of year. I'd like to extract x from df1 on the day of year specified in df2. E.g., in the year 1990 the flagged day of the year in df2 is 101. I want to get the value of x on the 101st day of 1990 from df1 and so on for every year. I wrote a loop that accomplishes this but there has to be a better way. Any help appreciated.

library(tidyverse)
library(lubridate)
set.seed(123)
df1 <- tibble(Date=seq(as.Date("1990/1/1"), as.Date("1999/12/31"), "days")) %>%
  mutate(Year = year(Date)) %>%
  mutate(DOY = yday(Date)) %>%
  group_by(Year) %>%
  mutate(x = cumsum(runif(n())))
  

df2 <- tibble(Year = seq(1990,1999),
              DOY = c(101,93,94,95,88,100,102,200,301,34),
              x=NA)

df1 %>% filter(Year == 1990, DOY == 101) %>% pull(x)

for(i in 1:10){
  df2$x[i] <- df1 %>% filter(Year == df2$Year[i], 
                             DOY == df2$DOY[i]) %>% pull(x)
}
df2
user111024
  • 723
  • 3
  • 15

2 Answers2

2

I think left_join is more efficient and easier to understand in this case. df3 is the final output.

library(tidyverse)
library(lubridate)
set.seed(123)
df1 <- tibble(Date=seq(as.Date("1990/1/1"), as.Date("1999/12/31"), "days")) %>%
  mutate(Year = year(Date)) %>%
  mutate(DOY = yday(Date)) %>%
  group_by(Year) %>%
  mutate(x = cumsum(runif(n())))


df2 <- tibble(Year = seq(1990,1999),
              DOY = c(101,93,94,95,88,100,102,200,301,34))

df3 <- df2 %>%
  left_join(df1, by = c("Year", "DOY")) %>%
  select(-Date)

df3
# # A tibble: 10 x 3
#    Year   DOY     x
#    <dbl> <dbl> <dbl>
#  1  1990   101  50.5
#  2  1991    93  45.4
#  3  1992    94  44.8
#  4  1993    95  47.2
#  5  1994    88  45.7
#  6  1995   100  52.2
#  7  1996   102  49.8
#  8  1997   200  96.1
#  9  1998   301 148. 
# 10  1999    34  14.1
www
  • 38,575
  • 12
  • 48
  • 84
1

This is not the best way but I think this is the easiest way as the two data frames have so much in common. I think it's fine for the first solution:

library(dplyr)

df2 %>%
  inner_join(df1, by = c("Year", "DOY")) %>%
  select(Year, DOY, x.y) %>%
  rename(x = x.y)

# A tibble: 10 x 3
    Year   DOY     x
   <dbl> <dbl> <dbl>
 1  1990   101  47.2
 2  1991    93  43.9
 3  1992    94  46.5
 4  1993    95  52.1
 5  1994    88  46.6
 6  1995   100  51.4
 7  1996   102  47.5
 8  1997   200 103. 
 9  1998   301 160. 
10  1999    34  14.9

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41