5

I want to transform a date range stored as two columns (start, end) + value to two new columns of just the date and value.

my data:

    id     end          start        value
1   4421   2014-01-01   2014-01-03   10
2   4421   2014-01-04   2014-01-04   500
3   4421   2014-01-05   2014-01-07   20
4   5560   2014-01-02   2014-01-03   100
5   5560   2014-01-04   2014-01-04   600

What I want:

    Date         id     value
0   2014-01-01   4421   10
1   2014-01-02   4421   10
2   2014-01-03   4421   10
3   2014-01-04   4421   500
4   2014-01-05   4421   20
5   2014-01-06   4421   20
6   2014-01-07   4421   20
7   2014-01-01   5560   NA
8   2014-01-02   5560   100
9   2014-01-03   5560   100
10  2014-01-04   5560   600

I’m using dplyr, so something that I can use with mutate & pipes etc. would be useful.

Sample data:

id <- c(4421, 4421, 4421, 5560, 5560)
start <- c('2014-01-01','2014-01-04','2014-01-05','2014-01-02','2014-01-04')
end = c('2014-01-03','2014-01-04','2014-01-07','2014-01-03','2014-01-04')
value <- c(10,500,20,100,600)
my_data <- data.frame(id,start,end,value)

FYI there's a very similar question in python, but I'm using R.

Edit: formatting Edit 2: this is a duplicate, there's some great stuff in the original post.

Thanks @www I like that there’s pipes all the way. @Wen-Ben thanks for the Pandas tips, I may be using Pandas in future.

i_love_chocolate
  • 414
  • 5
  • 16

2 Answers2

4

A solution using tidyverse. Not sure why in your expected output, id 5560 has NA in 2014-01-01 as it was not there.

library(tidyverse)

my_data2 <- my_data %>%
  mutate(start = as.Date(start), end = as.Date(end)) %>%
  mutate(Date = map2(start, end, ~seq(from = .x, to = .y, by = "day"))) %>%
  unnest() %>%
  select(Date, id, value) 
my_data2
#          Date   id value
# 1  2014-01-01 4421    10
# 2  2014-01-02 4421    10
# 3  2014-01-03 4421    10
# 4  2014-01-04 4421   500
# 5  2014-01-05 4421    20
# 6  2014-01-06 4421    20
# 7  2014-01-07 4421    20
# 8  2014-01-02 5560   100
# 9  2014-01-03 5560   100
# 10 2014-01-04 5560   600
www
  • 38,575
  • 12
  • 48
  • 84
  • BTW I put NA in there incase the solution involved creating a column of dates and then joining this date column to my data (this was my original approach but I got lost in the weeds) – i_love_chocolate Feb 26 '19 at 21:43
  • Brilliant answer--worked perfectly for me, and so elegant compared to the answers to the linked question. Thanks! – JMDR Jun 05 '21 at 04:07
1

Even easier than pandas unnest, also if you want to know more unnest in pandas check the How do I unnest (explode) a column in a pandas DataFrame?

library(tidyr)
library(dplyr)
lst <- Map(function(x, y) seq(as.Date(x),as.Date(y), "days"), df$start, df$end)
df$Date=lst
df%>%unnest(lst)
     id      start        end value        lst
1  4421 2014-01-01 2014-01-03    10 2014-01-01
2  4421 2014-01-01 2014-01-03    10 2014-01-02
3  4421 2014-01-01 2014-01-03    10 2014-01-03
4  4421 2014-01-04 2014-01-04   500 2014-01-04
5  4421 2014-01-05 2014-01-07    20 2014-01-05
6  4421 2014-01-05 2014-01-07    20 2014-01-06
7  4421 2014-01-05 2014-01-07    20 2014-01-07
8  5560 2014-01-02 2014-01-03   100 2014-01-02
9  5560 2014-01-02 2014-01-03   100 2014-01-03
10 5560 2014-01-04 2014-01-04   600 2014-01-04
BENY
  • 317,841
  • 20
  • 164
  • 234