How to list the keys that share a common value from a dataframe in R?

Question

I have a large dataframe (3M rows) with two columns: key and value, and I want to create a list of vectors (or any similar data structure), with as many elements as the number of different value, such that element k of the list is the vector of key whose value is k.

# original dataframe:
df
# key   value
#   4       a
#   2       a
#   3       k
#  12       a

# expected output:
list
# $`a`
# [1] 4 2 12
#
# $`k`
# [1] 3

I tried with a loop but it is very slow (it took 6 hours to treat 1M rows, and I stopped it there). Is there a more efficient method?

Amazing ! This does exactly what I wanted, in 15 seconds ! Thank's a lot, I think you can post your comment as an answer. — bixiou, Aug 24 '19 at 12:20
Actually, this has been asked before so I marked it as duplicate of original post but I am glad it worked for you. — Ronak Shah, Aug 24 '19 at 12:25

score 1 · Answer 1 · answered Aug 24 '19 at 10:15

1

You could try tidyr::nest(), but I don't know how it'll perform compared to your loop.

Example:

library(tidyr)

df <- tibble(
  id = letters,
  value = rep(1:13, 2)
)

df <- nest(df, id)

answered Aug 24 '19 at 10:15

shs

3,683
1
6
34

Thank you! It did the job in one minute (!) but it seems the list is not named. I managed to access the list of IDs of value k by doing: `as.data.frame(df %>% filter(value == k) %>% unnest())$id`, but if there might be a more practical method. Is there a way to convert the data structure to a named list of vectors (or something similar), so that I can export it as a JSON (or a CSV)? – bixiou Aug 24 '19 at 11:26

score 1 · Answer 2 · answered Aug 24 '19 at 10:18

1

This is handled by dplyr's group_rows and group_data methods for grouped data:

library(dplyr)

grp_df <- group_by(mtcars, gear)
group_rows(grp_df)

#[[1]]
# [1]  4  5  6  7 12 13 14 15 16 17 21 22 23 24 25
#
#[[2]]
# [1]  1  2  3  8  9 10 11 18 19 20 26 32
#
#[[3]]
#[1] 27 28 29 30 31

group_data(grp_df)

## A tibble: 3 x 2
#   gear .rows
#  <dbl> <list>
#1     3 <int [15]>
#2     4 <int [12]>
#3     5 <int [5]>

answered Aug 24 '19 at 10:18

Hong Ooi

56,353
13
134
187

Ok thank you. And then how can I access the IDs given the value? Is it possible to name the elements of the list? – bixiou Aug 24 '19 at 11:45
Actually this does not provide the intended result, as this returns the IDs instead of the keys (I expressed myself badly by naming the `key` column `id` at first, I corrected this). – bixiou Aug 24 '19 at 12:12

How to list the keys that share a common value from a dataframe in R?

2 Answers2