5

I'm using dplyr to transform a large data frame, and I want to store the DF's most recent date + 1 as a value. I know there's easier ways to do this by breaking up the statements, but I'm trying to do it all with one pipe statement. I ran into something and I'm not sure why R defaults that way. Example:

Day <- seq.Date(as.Date('2017-12-01'), as.Date('2018-02-03'), 'day')
Day <- sample(Day, length(Day))
ID <- sample(c(1:5), length(Day), replace = T)

df <- data.frame(ID, Day)

foo <- df %>% 
  arrange(desc(Day)) %>% 
  mutate(DayPlus = as.Date(Day) + 1) %>% 
  select(DayPlus) #%>% 
  #slice(1)

foo <- foo[1,1]

When I run this code, foo becomes a value equal to 2018-02-04 as desired. However, when I run the code with slice uncommented:

foo <- df %>% 
  arrange(desc(Day)) %>% 
  mutate(DayPlus = as.Date(Day) + 1) %>% 
  select(DayPlus) %>% 
  slice(1)

foo <- foo[1,1]

foo stays as a dataframe. My main question is why foo doesn't become a value in the second example, and my second question is if there's an easy way get the "2018-02-04" as a value stored as foo all from one dplyr pipe.

Thanks

CoolGuyHasChillDay
  • 659
  • 1
  • 6
  • 21
  • 4
    Perhaps you are looking for `pull`. Or just do `max(df$Day) + 1` – Henrik Feb 04 '18 at 20:11
  • I'll check out `pull` thanks. And yeah that would definitely work in this instance, but I have to do a lot of df manipulation before I get to the current `Day`/`ID` table in my actual code and I'd rather have it all in one statement. – CoolGuyHasChillDay Feb 04 '18 at 20:15
  • 3
    ` %>% summarise(max(Day) + 1) %>% pull()` – Henrik Feb 04 '18 at 20:24
  • Does anyone know why this was marked as a duplicate? Neither of the referenced questions address extracting a scalar from a 1x1 dataframe/tibble; they're both about extracting a column as a vector. Maybe a better understanding of R's strange typing system would make it more obvious, but for an R novice like myself, there isn't an apparent connection. – lehiester Apr 25 '20 at 17:54

1 Answers1

2

That's because your first snippet returns a data.frame, the second one returns a tibble. tibbles are similar to data.frames, but one major difference is subsetting. If you have a data.frame, foo[1, 1] returns the first row of the first column as a vector, whereas if you have a tibble it returns the first row of the first column as a tibble.

df %>% 
  arrange(desc(Day)) %>% 
  mutate(DayPlus = as.Date(Day) + 1) %>% 
  select(DayPlus) %>%
  class()

returns

[1] "data.frame"

whereas the second one

df %>% 
  arrange(desc(Day)) %>% 
  mutate(DayPlus = as.Date(Day) + 1) %>% 
  select(DayPlus) %>% 
  slice(1) %>%
  class()

returns

[1] "tbl_df"     "tbl"        "data.frame"
clemens
  • 6,653
  • 2
  • 19
  • 31