2

I have a data.frame with this structure;

df1 <- data.frame(
  gene = c("Gen1", "Gen2;Gen3", "Gen4"),
  freq = c(7, 21 , 51))

I would like some way to split Gene2 and Gene3 but keeping their frecuency value so the final result can look like df2

df2 <- data.frame(
      gene = c("Gen1", "Gen2", "Gen3", "Gen4"),
      freq = c(7, 21, 21 , 51))
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
  • 1
    With `tidyr` you can do: `separate_rows(df1, gene, sep = ';')` or see alternatives [here](https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows). – Ben Jun 08 '20 at 12:29
  • This indeed solved de problem, thank you very much – Jose Gracia Rodriguez Jun 08 '20 at 15:02

2 Answers2

0

You can use strsplit on df1$gene with ";". And then unlist the result and repeat the element in freq with lengths(x).

x <- strsplit(df1$gene, ";")
df2 <- data.frame(gene=unlist(x), freq = df1$freq[rep(seq_len(nrow(df1)),
 lengths(x))])
df2
#  gene freq
#1 Gen1    7
#2 Gen2   21
#3 Gen3   21
#4 Gen4   51
GKi
  • 37,245
  • 2
  • 26
  • 48
0

Using data.table:

setDT(df1)
df1[, .(gene = unlist(strsplit(gene, ";")),  freq), by = 1:nrow(df1)
    ][, !"nrow"]
#    gene freq
# 1: Gen1    7
# 2: Gen2   21
# 3: Gen3   21
# 4: Gen4   51
s_baldur
  • 29,441
  • 4
  • 36
  • 69