Turning rows with redundant information in columns

Question

Please I have some data arranged in rows and I would like to organize these lines in columns using "R" as follows:

The idea is to get "Allele, Effect and Obs" lines, which are arranged in 3 rows turning to columns. tag, Trait, Marker Locus are the same each three lines, what changes is the Allele, Effect and Obs.

Exemple: Initial data

data1 <-"tag Trait Marker Locus Site Allele Effect Obs
ca-S10_17086845 ca S10_17086845 10 17086845 R 0.000001  54
ca-S10_17086845 ca S10_17086845 10 17086845 A 3.489820   1
ca-S10_17086845 ca S10_17086845 10 17086845 G -0.017141 389
cf-S10_9890328  cf S10_9890328 10 9890328 R 0.000001 146
cf-S10_9890328  cf S10_9890328 10 9890328 G 4.367540   1
cf-S10_9890328  cf S10_9890328 10 9890328 A -0.010635 297"
data1 <-read.table(text=data1,header=T)

Expected outcome

data2 <- "Trait Marker Allele Ef1 Ef2 Ef3 Obs1 Obs2 Obs3
ca S10_17086845 R/A/G 0.000001 3.489820 -0.017141 54 1 389
cf S10_9890328 R/G/A 0.000001 4.367540 -0.010635 146 1 297"
data2 <-read.table(text=data2,header=T)

Thank you

I am trying to follow all the stackoverflow rules so, kindly, I would like to know why It was drawn me a point, so I can improve the next question — Fpertille, Aug 25 '16 at 15:57

score 1 · Answer 1 · answered Jun 22 '16 at 22:09

You can use dplyr to paste alleles and also get first, second and third Effects or Obs.

library(dplyr)
data1 %>% group_by(Trait, Marker) %>% summarize( allele = paste(Allele, collapse="/"), ef1=first(Effect), ef2=nth(Effect,2) , ef3=nth(Effect,3) )

  Trait       Marker allele      ef1     ef2       ef3
1    ca S10_17086845  R/A/G 0.000001 3.48982 -0.017141
2    cf  S10_9890328  R/G/A 0.000001 4.36754 -0.010635

score 1 · Answer 2 · answered Jun 22 '16 at 22:20

There might be a simpler way, but this combo of dplyr and tidyr works:

library(dplyr)
library(tidyr)
data3 <- data1 %>% 
          group_by(tag) %>% 
          mutate(obstag = paste0("Obs", seq_along(Obs)),  # Add markers
                 eftag = paste0("Ef", seq_along(Effect)),
                 altag = paste0("A", seq_along(Allele))) %>%
          spread(altag, Allele) %>%  # Switch from rows to columns 
          spread(obstag, Obs) %>% 
          spread(eftag, Effect) %>% 
          summarise_each(funs(unique(na.omit(.))), 1:Ef3) %>%  # Collapse into one row per tag 
          mutate(Allele = paste(A1, A2, A3, sep = "/")) %>% # paste alleles together
          select(-A1, -A2, -A3, -tag) # drop unwanted columns

score 1 · Accepted Answer · answered Jun 22 '16 at 22:49

You can try something like this using dplyr and tidyr, where we first remove two irrelevant columns tag and Site which I think you should have done before posting to make the idea more clear; then summarize by Trait, Marker and Locus groups paste the Allele and Effect and Obs together. Finally a separation on columns Effect and Obs get to the result data frame:

library(dplyr); library(tidyr);
data1 %>% select(-tag, -Site) %>% group_by(Trait, Marker, Locus) %>% 
          summarise_each(funs(paste(., collapse = "/"))) %>% 
          separate(Effect, into = paste("Eff", 1:3, sep = ""), sep = "/") %>% 
          separate(Obs, into = paste("Obs", 1:3, sep = ""), sep = "/")

# Source: local data frame [2 x 10]
# Groups: Trait, Marker [2]
# 
#    Trait       Marker Locus Allele     Eff1    Eff2      Eff3  Obs1  Obs2  Obs3
#   (fctr)       (fctr) (int)  (chr)    (chr)   (chr)     (chr) (chr) (chr) (chr)
# 1     ca S10_17086845    10  R/A/G 0.000001 3.48982 -0.017141    54     1   389
# 2     cf  S10_9890328    10  R/G/A 0.000001 4.36754 -0.010635   146     1   297

akrun · Answer 4 · 2016-06-23T02:58:50.897

We can do this easily with data.table. Convert the 'data.frame' to 'data.table' (setDT(data1)), create the 'Allele1' column by pasteing 'Allele' together after grouping by 'Trait' and 'Marker'. Then, use dcast to reshape it to 'wide' format. The data.table::dcast is different from the reshape2:dcast as it can also take multiple value.var columns.

library(data.table)#v1.9.7+
setDT(data1)[,  Allele1 := paste(Allele, collapse="/") , .(Trait, Marker)]
dcast(data1, Trait + Marker + Allele1 ~ rowid(Trait), 
                              value.var = c("Effect", "Obs"))
#   Trait       Marker Allele1 Effect_1 Effect_2  Effect_3 Obs_1 Obs_2 Obs_3
#1:    ca S10_17086845   R/A/G 0.000001  3.48982 -0.017141    54     1   389
#2:    cf  S10_9890328   R/G/A 0.000001  4.36754 -0.010635   146     1   297

Thank you, but I think this command is a little bit heavy to my data... Error: cannot allocate vector of size 11282.6 Gb — Fpertille, Jun 23 '16 at 12:04

Turning rows with redundant information in columns

4 Answers4