0

I have data formatted in the following way

-------------------------
| A | a |            1   |
-------------------------
| A | a |            2   |
-------------------------
| A | a |            3   |
-------------------------
| B | b |            1   |
-------------------------
| B | b |            3   |
-------------------------
| B | b |            5   |
-------------------------

Basically, if the first key is the same, all the columns except the last one are the same. I want to make the data-frame more compact and make it so instead of one observation per instance of the last column, I have a list of numbers

Something like this:

-------------------------
| A | a |        1,2,3   |
-------------------------
| B | b |        1,3,5   |
-------------------------

I am learning the tidyverse in R, but I am very new and I have not been able to find the right function to do this yet. I appreciate any input.

Neoleogeo
  • 313
  • 2
  • 11

3 Answers3

5

Group by your ID column, and summarize the number column using paste (with collapse=","). That will give you a (character) column with the list of numbers for each ID row.

require(dplyr)
df<-data.frame(a=c("a","a","a","b","b"),b=c(1,2,3,1,2))
df %>% group_by(a) %>% summarize(b=paste(b,collapse=","))

# A tibble: 2 x 2
  a     b      
  <fct> <chr>  
1 a     1,2,3
2 b     1,2   
iod
  • 7,412
  • 2
  • 17
  • 36
2

1) list This makes the third column a list such that each element of the list contains a vector.

library(dplyr)
library(tidyr)

DF %>%
  group_by(V1, V2) %>%
  nest %>%
  ungroup

giving:

# A tibble: 2 x 3
  V1    V2    data            
  <fct> <fct> <list>          
1 A     a     <tibble [3 x 1]>
2 B     b     <tibble [3 x 1]>

1a) This can also be written

DF %>% nest(V3, .key = "V3")

2) character Another possibility would be to create strings out of the data in the third column:

library(dplyr)

DF %>%
  group_by(V1, V2) %>%
  summarize(V3 = toString(V3)) %>%
  ungroup

giving:

# A tibble: 2 x 3
  V1    V2    V3     
  <fct> <fct> <chr>  
1 A     a     1, 2, 3
2 B     b     1, 3, 5

3) sql The above used dplyr. This uses SQL to give an answer similar to (2).

library(sqldf)
sqldf("select V1, V2, group_concat(V3) V3 
       from DF
       group by V1, V2", method = "raw")

giving:

  V1 V2    V3
1  A  a 1,2,3
2  B  b 1,3,5

Note

In the future please provide the data in reproducible form like this:

Lines <- "
A | a |            1
A | a |            2
A | a |            3
B | b |            1
B | b |            3
B | b |            5"
DF <- read.table(text = Lines, sep = "|", strip.white = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Using R base:

> aggregate(V3~V1+V2, data=DF, paste0, collapse=",")
  V1 V2    V3
1  A  a 1,2,3
2  B  b 1,3,5
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138