0

I have a data frame with multiple columns as follows:

 Frequency                 Alels
   0.5                      C
   0.6                      C,G
   0.02                     A,T,TTT

And I want to split the value of second column and the new rows have frequency = 0.

I'm trying with separate() from tidyr package but I can't change the frequency column in new rows and I get the above results:

Frequency                 Alels
   0.5                      C
   0.6                      C
   0.6                      G
   0.02                     A
   0.02                     T
   0.02                    TTT

But I want the output as follows:

Frequency                 Alels
   0.5                      C
   0.6                      C
   0                        G
   0.02                     A
   0                        T
   0                        TTT

I'm trying with separate() from tidyr package but I can't change the frequency column in new rows.

  • Possible duplicate of [Split comma-separated column into separate rows](https://stackoverflow.com/questions/13773770/split-comma-separated-column-into-separate-rows) – Dan May 02 '18 at 14:03
  • I looked at that post but , he just want to duplicate all the values on columns, i want to change one. – Javier bertol chorro May 02 '18 at 14:06
  • What are the rules for changing the value? Is it always the final value in the `Alels` that gets a new value? Can there be more than two values in `Alels`? Also, can you `dput` your data so it can be used in an answer? – Dan May 02 '18 at 14:12
  • Yes, Alels columns can have multiples values all separated by "," . I want to split that values, for the first value I want the original frecuency, and for the another values I want to set frecuency value to 0. The rule is put the frecuency of the second and further values to 0 – Javier bertol chorro May 02 '18 at 14:16

3 Answers3

0

This should work:

d <- read.table(text = "Frecuency                 Alels
   0.5                      C
                0.6                      C,G",
                header = T, stringsAsFactors = F)

counts <- sapply(strsplit(d$Alels, split = ","), length)

data.frame("Frecuency" = unlist(lapply(seq_along(d$Frecuency),
                                       function(x) c(d$Frecuency[x], 
                                                     rep(0, counts[x] -1)))), 
           "Alels" = unlist(strsplit(d$Alels, split = ",")))
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0

Not pretty, but I think it works.

# Create data frame 
df <- data.frame(frequency = c(0.5, 0.6),
                 alels = c("C", "C, G, T"),
                 stringsAsFactors = FALSE)

# Duplicate the alels column, separate rows 
# Requires magrittr, dplyr, tidyr
df %<>% 
  mutate(alels_check = alels) %>% 
  separate_rows(alels, sep = ",", convert = TRUE)  

# Check for dupes and set them to zero
df[duplicated(df$frequency, df$alels_check),]$frequency <- 0

# Remove the duplicated alels column
df %<>% select(-alels_check)

Original:

#    frequency   alels
# 1       0.5       C
# 2       0.6 C, G, T

Result:

#    frequency alels
# 1       0.5     C
# 2       0.6     C
# 3       0.0     G
# 4       0.0     T

Using your data:

#   frequency     alels
# 1      0.50         C
# 2      0.60      C, G
# 3      0.02 A, T, TTT

#   frequency alels
# 1      0.50     C
# 2      0.60     C
# 3      0.00     G
# 4      0.02     A
# 5      0.00     T
# 6      0.00   TTT 
Dan
  • 11,370
  • 4
  • 43
  • 68
0

the data from your example:

df <- read.table(text = " Frequency                 Alels
   0.5                      C
             0.6                      C,G
             0.02                     A,T,TTT",
            header = T, stringsAsFactors = F)

and another solution for you to consider:

library(dplyr)

lapply(1:nrow(df), 
   function(row_num){
     s <- strsplit(df$Alels[row_num], ",") %>% unlist
     data.frame(Frequency = c(df$Frequency[row_num], rep(0,length(s)-1)),
                Alels = s)
     }) %>% do.call(rbind, .)
df

instead of do.call(rbind, .) you can also choose to use rbindlist()from the package data.table

DS_UNI
  • 2,600
  • 2
  • 11
  • 22