-3

I have a large dataframe (3M rows) with two columns: key and value, and I want to create a list of vectors (or any similar data structure), with as many elements as the number of different value, such that element k of the list is the vector of key whose value is k.

# original dataframe:
df
# key   value
#   4       a
#   2       a
#   3       k
#  12       a

# expected output:
list
# $`a`
# [1] 4 2 12
#
# $`k`
# [1] 3

I tried with a loop but it is very slow (it took 6 hours to treat 1M rows, and I stopped it there). Is there a more efficient method?

bixiou
  • 124
  • 2
  • 10

2 Answers2

1

You could try tidyr::nest(), but I don't know how it'll perform compared to your loop.

Example:

library(tidyr)

df <- tibble(
  id = letters,
  value = rep(1:13, 2)
)

df <- nest(df, id)
shs
  • 3,683
  • 1
  • 6
  • 34
  • Thank you! It did the job in one minute (!) but it seems the list is not named. I managed to access the list of IDs of value k by doing: `as.data.frame(df %>% filter(value == k) %>% unnest())$id`, but if there might be a more practical method. Is there a way to convert the data structure to a named list of vectors (or something similar), so that I can export it as a JSON (or a CSV)? – bixiou Aug 24 '19 at 11:26
1

This is handled by dplyr's group_rows and group_data methods for grouped data:

library(dplyr)

grp_df <- group_by(mtcars, gear)
group_rows(grp_df)

#[[1]]
# [1]  4  5  6  7 12 13 14 15 16 17 21 22 23 24 25
#
#[[2]]
# [1]  1  2  3  8  9 10 11 18 19 20 26 32
#
#[[3]]
#[1] 27 28 29 30 31

group_data(grp_df)

## A tibble: 3 x 2
#   gear .rows
#  <dbl> <list>
#1     3 <int [15]>
#2     4 <int [12]>
#3     5 <int [5]>
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • Ok thank you. And then how can I access the IDs given the value? Is it possible to name the elements of the list? – bixiou Aug 24 '19 at 11:45
  • Actually this does not provide the intended result, as this returns the IDs instead of the keys (I expressed myself badly by naming the `key` column `id` at first, I corrected this). – bixiou Aug 24 '19 at 12:12