2

I struggle to tidy some data that has been acquired in a weird way... It has some patient identifier, then a column with a date of a test, and then a column with the corresponding measurement. But it has the same test repeated over time and the data are in subsequent columns.

The dataframe is like this:

df1 <- data.frame(id = c("A","B"),
                 test1 = c("10-12-16", "12-10-17"),
                 test1_result = c("20", "3"),
                 test2 = c("10-01-17", "11-12-17"),
                 test2_result = c("18", "4"),
                 test3 = c("12-03-18", "NA"),
                 test3_result = c("300", "NA"))

And I would like to obtain something like this:

df2 <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
                 tests = c("10-12-16", "10-01-17", "12-03-18", "12-10-17", "11-12-17", "NA"),
                 results = c("20", "18", "300", "3", "4", "NA")
                 )

I cannot figure out a way of transforming it, any help would be very much appreciated.

Thanks!

r_mvl
  • 109
  • 5

2 Answers2

3

You can try melt from data.table:

library(data.table)
setDT(df1)

df2 <- melt(df1, id = 'id', measure = patterns('test\\d$', '_result'))[
    , .(id, tests = value1, results = value2)]

#    id    tests results
# 1:  A 10-12-16      20
# 2:  B 12-10-17       3
# 3:  A 10-01-17      18
# 4:  B 11-12-17       4
# 5:  A 12-03-18     300
# 6:  B       NA      NA
mt1022
  • 16,834
  • 5
  • 48
  • 71
2

Here is a possibility using dplyr:

library(tidyverse);
df1 %>% 
    gather(k1, results, contains("_result")) %>% 
    mutate(k1 = gsub("_result", "", k1)) %>% 
    gather(k2, tests, contains("test")) %>% 
    filter(k1 == k2) %>% 
    select(id, tests, results)
#  id    tests results
#1  A 10-12-16      20
#2  B 12-10-17       3
#3  A 10-01-17      18
#4  B 11-12-17       4
#5  A 12-03-18     300
#6  B       NA      NA
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68