-2

I have a data frame with 131 millions rows.

The data.frame has 2 columns. Column 1 is a number. Column 2 is a list of values

Something like this

Col1 | Col2  
1    | a, b, c  
2    | d, e, f  
3    | a, e, f  

And I need to put it like this:

Col1 | Col2  
1    | a  
1    | b  
1    | c  
2    | d  
2    | e  
2    | f  
3    | a  
3    | e  
3    | f  

It must be quick because of the 131 millions rows.

Uwe
  • 41,420
  • 11
  • 90
  • 134
sys0p
  • 3
  • 1
  • I would suggest to read each row, parse and insert, you can also use a storage procedure with a cursor but there is no fast way to achieve this. – Mr. bug Oct 31 '17 at 18:03
  • Can you give real data, and explain what you have tried? – Arthur Oct 31 '17 at 18:08
  • Please, [edit] your question and add the output of `dput(your_data_frame)`. This will exhibit the data structure and will help others to answer your question. Thank you. – Uwe Oct 31 '17 at 18:08

1 Answers1

0

By using unnest if Col2 is string

library(tidyr)
library(dplyr)
dt %>% 
    mutate(Col2 = strsplit(Col2,",")) %>% 
    unnest(Col2)
# A tibble: 9 x 2
   Col1  Col2
  <dbl> <chr>
1     1     a
2     1     b
3     1     c
4     2     d
5     2     e
6     2     f
7     3     a
8     3     e
9     3     f

Data Input :

dt=data_frame(Col1 = c(1,2,3),Col2 = c('a, b, c','d, e, f','a, e, f'))

As you mention it is list of value , so you just need to

dt %>% unnest(Col2)
# A tibble: 9 x 2
   Col1  Col2
  <dbl> <chr>
1     1     a
2     1     b
3     1     c
4     2     d
5     2     e
6     2     f
7     3     a
8     3     e
9     3     f

Data Input

dt
# A tibble: 3 x 2
   Col1      Col2
  <dbl>    <list>
1     1 <chr [3]>
2     2 <chr [3]>
3     3 <chr [3]>
BENY
  • 317,841
  • 20
  • 164
  • 234