0

I Have below mentioned dataframe in R:

Unique_ID     D_1         ST_1        D_2    ST_2         D_3       ST_3   
JJ-123    2018-04-01   No Range  2018-03-12  50-80    2018-02-01   10-30
JJ-113    2018-04-01   50-80     2018-03-05  50-80    2018-02-01   10-30
JJ-457    2018-04-03   10-30     2018-03-12  1-5      2018-02-01   No Range
JJ-879    2018-04-01   No Range  2018-03-12  50-80    2018-02-01   50-80

Note: For the sake of simplicity i have mentioned only three ST_ value though in original dataframe I have column till ST_38.

Dput:

structure(list(Unique_ID = c("JJ-123", "JJ-113", "JJ-457", "JJ-879"
), D_1 = c("01/04/2018", "01/04/2018", "03/04/2018", "01/04/2018"
), ST_1 = c("No Range", "50-80", "10-30", "No Range"), D_2 = c("12/03/2018", 
"05/03/2018", "12/03/2018", "12/03/2018"), ST_2 = c("50-80", 
"50-80", "1-5", "50-80"), D_3 = c("01/02/2018", "01/02/2018", 
"01/02/2018", "01/02/2018"), ST_3 = c("10-30", "10-30", "No Range", 
"50-80")), class = "data.frame", row.names = c(NA, -4L))

Using above dataframe I want to get oldest date when ST_ value change first time to 10-30 and 50-80.

Output:

Unique_ID     10-30         50-80
JJ-123        2018-02-01    2018-03-12
JJ-113        2018-02-01    2018-03-05
JJ-457        2018-04-03    NA
JJ-879        NA            2018-02-01 
Jupiter
  • 221
  • 1
  • 12

2 Answers2

2
library(tidyr)
library(dplyr)

d %>% gather("variable", "value", -Unique_ID) %>%
  separate(variable, c("variable", "number")) %>%
  spread(variable, value) %>%
  mutate(D = as.Date(D, format="%d/%m/%Y")) %>%
  filter(ST %in% c("10-30", "50-80")) %>%
  group_by(Unique_ID, ST) %>%
  filter(D==min(D)) %>%
  select(-number) %>%
  spread(ST, D)

## # A tibble: 4 x 3
## # Groups:   Unique_ID [4]
##   Unique_ID    `10-30`    `50-80`
## *     <chr>     <date>     <date>
## 1    JJ-113 2018-02-01 2018-03-05
## 2    JJ-123 2018-02-01 2018-03-12
## 3    JJ-457 2018-04-03         NA
## 4    JJ-879         NA 2018-02-01
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
0

It sounds to me you need to use gather to first convert this dataframe from wide to long. 1) you need two new columns St_Number and ST_Value. 2) you need two new columns Date_Number and Date_Value.

Doing these two (well, I guess it's 4) things will allow you to use group_by on the Unique_ID value you want and do all kinds of operations.

Use the principles of tidy data your life will be so much easier.

dportman
  • 1,101
  • 10
  • 20
BBlank
  • 58
  • 5
  • Yes, but how would i get oldest date or first date when Status change to `10-30` or `50-80`. – Jupiter Jun 22 '18 at 14:54
  • filter, piped into group_by, piped into summarize is what I would try. You might have to convert the date value field to an actual datetime using lubridate first to be able to use min inside of summarize. – BBlank Jun 22 '18 at 16:00
  • 1
    @Jupiter, if you include a sample of your data that we can read into R, @BBlank and others will be more able to help. Use `dput`; see [this question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for guidelines. – Aaron left Stack Overflow Jun 22 '18 at 16:37
  • @Aaron Dput samplet: `structure(list(Unique_ID = c("JJ-123", "JJ-113", "JJ-457", "JJ-879" ), D_1 = c("01/04/2018", "01/04/2018", "03/04/2018", "01/04/2018" ), ST_1 = c("No Range", "50-80", "10-30", "No Range"), D_2 = c("12/03/2018", "05/03/2018", "12/03/2018", "12/03/2018"), ST_2 = c("50-80", "50-80", "1-5", "50-80"), D_3 = c("01/02/2018", "01/02/2018", "01/02/2018", "01/02/2018"), ST_3 = c("10-30", "10-30", "No Range", "50-80")), class = "data.frame", row.names = c(NA, -4L))` – Jupiter Jun 23 '18 at 06:05