Getting all values of column B for each value of column A in new data frame

Question

I have a data frame with two columns. I would like to create a new data frame which lists all values of second column for each unique value of the first column in the first data frame. I do not want to use data tables.

After several trials and errors, I came up with the following. I would like to know if there is an easier (one-step?), faster, or more optimal way to achieve this, since the actual data frames I will be running this on are very large.

> df <- data.frame( a=c( 1, 1, 2, 2, 3 ), b=c( 6:10 ) );
> df
  a  b
1 1  6
2 1  7
3 2  8
4 2  9
5 3 10
> df2 <- data.frame( a=unique( df$a ) )
> temp <- dlply( df, .(a), function( x ) data.frame( bs=x$b ) );
> df2$bs <- lapply( temp, function( x ) x$bs )
> df2
  a   bs
1 1 6, 7
2 2 8, 9
3 3   10
>

Thanks.

score 1 · Answer 1 · answered Aug 20 '18 at 10:42

1

With tidyverse:

library(tidyverse)
 df%>%
   group_by(a)%>%
   summarise(bs=glue::collapse(b,","))
# A tibble: 3 x 2
      a bs  
  <dbl> <chr>
1    1. 6,7  
2    2. 8,9  
3    3. 10

answered Aug 20 '18 at 10:42

jyjek

2,627
11
23

markus · Answer 2 · 2018-08-20T10:51:39.327

1

A base R way

aggregate(b ~ a, df, FUN = toString)
#     a    b
#1    1 6, 7
#2    2 8, 9
#3    3   10

If you want keep the entries numeric, perhaps try creating a list column.

(df_new <- aggregate(b ~ a, df, FUN = list))
#  a    b
#1 1 6, 7
#2 2 8, 9
#3 3   10

str(df_new)
#'data.frame':  3 obs. of  2 variables:
# $ a: num  1 2 3
# $ b:List of 3
#  ..$ 1: int  6 7
#  ..$ 2: int  8 9
#  ..$ 3: int 10

edited Aug 20 '18 at 10:51

answered Aug 20 '18 at 10:46

markus

25,843
5
39
58

Thanks, any way to do this without converting to character, i.e. keeping the list numeric? – user22209 Aug 20 '18 at 10:49
2

Seems like Markus and I were doing almost the same. Anyway, if you want commas in the second column it can never be a numeric variable. – Lennyy Aug 20 '18 at 10:50

score 1 · Accepted Answer · answered Aug 20 '18 at 10:48

1

aggregate(b ~ a, df, paste)

  a    b
1 1 6, 7
2 2 8, 9
3 3   10

answered Aug 20 '18 at 10:48

Lennyy

5,932
2
10
23

score 0 · Answer 4 · answered Aug 20 '18 at 12:57

0

We can use data.table

library(data.table)
setDT(df)[, .(b = toString(b)), by = a]
#   a    b
#1: 1 6, 7
#2: 2 8, 9
#3: 3   10

answered Aug 20 '18 at 12:57

akrun

874,273
37
540
662

Getting all values of column B for each value of column A in new data frame

4 Answers4