1

I have a data set like this:

ID       DATE      VALUE
9101001 11-04-2010  4
9101001 11-10-2010  4
9101002 28-12-2009  104
9101002 31-03-2010  193
9101002 26-08-2010  130
9101002 13-01-2011  128
9101002 12-04-2011  27
9101002 08-12-2011  18
9101002 17-07-2012  85
9101002 10-10-2012  86
9101002 19-12-2012  4
9101002 21-01-2013  31
9101003 16-09-2008  273
9101003 24-03-2009  311
9101003 15-03-2011  166
9101003 21-04-2011  62

and I need to transfer it into like this:

ID       DATE1      VALUE1 DATE2     VALU2 DATE3 VALUE3 etc
9101001  11-04-2010   4   11-10-2010  2

So, that each ID has only one row

Anyone can help, please? Many thanks!

MKR
  • 19,739
  • 4
  • 23
  • 33
  • Use `tidyr::spread`: https://www.rdocumentation.org/packages/tidyr/versions/0.8.0/topics/spread – divibisan Apr 24 '18 at 21:37
  • 1
    A combination of `tidyr::gather` and `tidyr::spread` will do that, but realize Frank_Hanson that your resulting frame will have a lot of `NA`s, since the data is not balanced across `$ID`. – r2evans Apr 24 '18 at 21:40

2 Answers2

0

Using two packages from the tidyverse:

newdat <- dat %>%
  group_by(ID) %>%
  mutate(n = row_number()) %>%
  ungroup() %>%
  gather(k, v, -ID, -n) %>%
  unite(k, c(k, n), sep="") %>%
  spread(k, v)
newdat  
# # A tibble: 3 × 21
#        ID      DATE1     DATE10      DATE2      DATE3      DATE4      DATE5      DATE6
# *   <int>      <chr>      <chr>      <chr>      <chr>      <chr>      <chr>      <chr>
# 1 9101001 11-04-2010       <NA> 11-10-2010       <NA>       <NA>       <NA>       <NA>
# 2 9101002 28-12-2009 21-01-2013 31-03-2010 26-08-2010 13-01-2011 12-04-2011 08-12-2011
# 3 9101003 16-09-2008       <NA> 24-03-2009 15-03-2011 21-04-2011       <NA>       <NA>
# # ... with 13 more variables: DATE7 <chr>, DATE8 <chr>, DATE9 <chr>, VALUE1 <chr>,
# #   VALUE10 <chr>, VALUE2 <chr>, VALUE3 <chr>, VALUE4 <chr>, VALUE5 <chr>, VALUE6 <chr>,
# #   VALUE7 <chr>, VALUE8 <chr>, VALUE9 <chr>

So this gets you the right columns, but not in the right order. If that's important:

newdat[c(1, 1L + order(
  as.integer(gsub("[^0-9]", "", colnames(newdat[-1]))),
  colnames(newdat[-1])
))]
# # A tibble: 3 × 21
#        ID      DATE1 VALUE1      DATE2 VALUE2      DATE3 VALUE3      DATE4 VALUE4      DATE5
#     <int>      <chr>  <chr>      <chr>  <chr>      <chr>  <chr>      <chr>  <chr>      <chr>
# 1 9101001 11-04-2010      4 11-10-2010      4       <NA>   <NA>       <NA>   <NA>       <NA>
# 2 9101002 28-12-2009    104 31-03-2010    193 26-08-2010    130 13-01-2011    128 12-04-2011
# 3 9101003 16-09-2008    273 24-03-2009    311 15-03-2011    166 21-04-2011     62       <NA>
# # ... with 11 more variables: VALUE5 <chr>, DATE6 <chr>, VALUE6 <chr>, DATE7 <chr>,
# #   VALUE7 <chr>, DATE8 <chr>, VALUE8 <chr>, DATE9 <chr>, VALUE9 <chr>, DATE10 <chr>,
# #   VALUE10 <chr>

The c(1L, 1L + ... stuff is to remove $ID from consideration on the reordering. There are almost certainly other ways to reorder the columns.


Copyable data:

dat <- read.table(text='ID       DATE      VALUE
9101001 11-04-2010  4
9101001 11-10-2010  4
9101002 28-12-2009  104
9101002 31-03-2010  193
9101002 26-08-2010  130
9101002 13-01-2011  128
9101002 12-04-2011  27
9101002 08-12-2011  18
9101002 17-07-2012  85
9101002 10-10-2012  86
9101002 19-12-2012  4
9101002 21-01-2013  31
9101003 16-09-2008  273
9101003 24-03-2009  311
9101003 15-03-2011  166
9101003 21-04-2011  62', header=TRUE, stringsAsFactors=FALSE)
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

In this particular situation where number of columns will vary for each row, splitstackshape::cSplit provides an elegant way to achieve a solution after summarising values based on ID.

The approach is to first concatenate values of DATE and VALUE using a separator (say |). Now, splitstackshape::cSplit can be used to separate those column.

library(splitstackshape)
library(dplyr)

 df_new <- df %>% group_by(ID) %>%
  summarise(DATE = paste0(DATE,collapse="|"), 
                    VALUE=paste0(VALUE,collapse="|")) %>%
  cSplit(c("DATE","VALUE"), sep = "|")

Result:

# ID    DATE_01    DATE_02    DATE_03    DATE_04    DATE_05    DATE_06    DATE_07    DATE_08    DATE_09
# 1: 9101001 11-04-2010 11-10-2010         NA         NA         NA         NA         NA         NA         NA
# 2: 9101002 28-12-2009 31-03-2010 26-08-2010 13-01-2011 12-04-2011 08-12-2011 17-07-2012 10-10-2012 19-12-2012
# 3: 9101003 16-09-2008 24-03-2009 15-03-2011 21-04-2011         NA         NA         NA         NA         NA
# DATE_10 VALUE_01 VALUE_02 VALUE_03 VALUE_04 VALUE_05 VALUE_06 VALUE_07 VALUE_08 VALUE_09 VALUE_10
# 1:         NA        4        4       NA       NA       NA       NA       NA       NA       NA       NA
# 2: 21-01-2013      104      193      130      128       27       18       85       86        4       31
# 3:         NA      273      311      166       62       NA       NA       NA       NA       NA       NA

Data:

df <- read.table(text = 
"ID       DATE      VALUE
9101001 11-04-2010  4
9101001 11-10-2010  4
9101002 28-12-2009  104
9101002 31-03-2010  193
9101002 26-08-2010  130
9101002 13-01-2011  128
9101002 12-04-2011  27
9101002 08-12-2011  18
9101002 17-07-2012  85
9101002 10-10-2012  86
9101002 19-12-2012  4
9101002 21-01-2013  31
9101003 16-09-2008  273
9101003 24-03-2009  311
9101003 15-03-2011  166
9101003 21-04-2011  62",
header = T)
MKR
  • 19,739
  • 4
  • 23
  • 33