Extract information from multiple columns at different positions using R

Question

I have a dataframe with approx. 26,000 entries. The dataframe has one ID column with IDs separated by ";" and multiple value columns also separated by ";". It looks something like this:

df <- data.frame (ID = c("sample1;sample2;sample3", "sample3", "sample3;sample4;sample5;sample6"),
                  value_1 = c("10;20;30", "30", "30;40;50;60"),
                  value_2 = c("130", "130", "130"))

The problem is that the number and the position of the ID entries and the corresponding values are constant within a row, but they are different across the rows. In addition, there are columns (e.g. value_2) in which entries should remain constant. Is there an easy way to reduce the complexity of the data set? Thanks

akrun · Accepted Answer · 2021-02-02T06:44:19.877

0

We can use separate_rows

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(ID, value_1, convert = TRUE)

-output

# A tibble: 8 x 3
# ID      value_1 value_2
#  <chr>     <int> <chr>  
#1 sample1      10 130    
#2 sample2      20 130    
#3 sample3      30 130    
#4 sample3      30 130    
#5 sample3      30 130    
#6 sample4      40 130    
#7 sample5      50 130    
#8 sample6      60 130

Or using cSplit

library(splitstackshape)
cSplit(df, c("ID", "value_1"), ";", "long")

edited Feb 02 '21 at 06:44

answered Feb 02 '21 at 06:38

akrun

874,273
37
540
662

Dear akrun, thanks for the quick answer! Would you have another idea what I could do if the number of entries is not constant? For example, if the dataframe looks like this: `df <- data.frame (ID = c("sample1;sample2;sample3", "sample3", "sample3;sample4;sample5;sample6"), value_1 = c("10;20;30", "30", "30;40;50;60;70;80"), constant = c("130", "130", "130"))` – Heiko Feb 02 '21 at 06:51
@Heiko `cSplit` works on that, but I am not sure about what the expected output is – akrun Feb 02 '21 at 06:58
1

I just checked - it works perfect and is extremely fast! Thank you so much, akrun! – Heiko Feb 02 '21 at 07:07

Extract information from multiple columns at different positions using R

1 Answers1