2

I have a data set look like:

df = data.frame("ID" = c("AA","AA","BB","BB"),"OriginID" = c("A1","A","B1","B"),"OriginValue" = c(4,3,8,9))

ID OriginID OriginValue
AA       A1           4
AA        A           3
BB       B1           8
BB        B           9

And I want my output look likt

df1 = data.frame("ID" = c("AA","BB"),"OriginID_1" = c("A1","B1"),"OriginValue_1" = c(4,8)
                 ,"OriginID_2" = c("A","B"),"OriginValue_2" = c(3,9))

ID OriginID_1 OriginValue_1 OriginID_2 OriginValue_2
AA         A1             4          A             3
BB         B1             8          B             9

What's the best way to do it?

Thanks!

April
  • 93
  • 8

1 Answers1

1

We can use pivot_wider after a unique column identifier by 'ID'

library(dplyr)
library(tidyr)
df %>%
    group_by(ID) %>% 
    mutate(rn = row_number()) %>% 
    pivot_wider(names_from = rn, values_from = c(OriginID, OriginValue))%>% 
    select(ID, names(.)[-1][order(readr::parse_number(names(.)[-1]))])
# A tibble: 2 x 5
# Groups:   ID [2]
#  ID    OriginID_1 OriginValue_1 OriginID_2 OriginValue_2
#  <fct> <fct>              <dbl> <fct>              <dbl>
#1 AA    A1                     4 A                      3
#2 BB    B1                     8 B                      9
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Using `data.table`: `library(data.table); setDT(df); dcast(melt(df, measure.vars = c("OriginID", "OriginValue")), ID ~ paste(variable, rowid(variable, ID), sep = "_"))` – markus Nov 21 '19 at 21:42
  • thanks @markus, but somehow i got 'NA' for all values... – April Nov 21 '19 at 21:53
  • 1
    Thank you @akrun, it works! – April Nov 21 '19 at 21:54
  • One follow up question @akrun, can you explain how the last calculation sorts columns by row number? select(ID, names(.)[-1][order(readr::parse_number(names(.)[-1]))]) – April Nov 22 '19 at 20:53
  • @jingcc `parse_number` extracts the digit ie. 1, 2 at the end of column name, and we order based on that – akrun Nov 23 '19 at 03:00