0

Please I have some data arranged in rows and I would like to organize these lines in columns using "R" as follows:

The idea is to get "Allele, Effect and Obs" lines, which are arranged in 3 rows turning to columns. tag, Trait, Marker Locus are the same each three lines, what changes is the Allele, Effect and Obs.

Exemple: Initial data

data1 <-"tag Trait Marker Locus Site Allele Effect Obs
ca-S10_17086845 ca S10_17086845 10 17086845 R 0.000001  54
ca-S10_17086845 ca S10_17086845 10 17086845 A 3.489820   1
ca-S10_17086845 ca S10_17086845 10 17086845 G -0.017141 389
cf-S10_9890328  cf S10_9890328 10 9890328 R 0.000001 146
cf-S10_9890328  cf S10_9890328 10 9890328 G 4.367540   1
cf-S10_9890328  cf S10_9890328 10 9890328 A -0.010635 297"
data1 <-read.table(text=data1,header=T)

Expected outcome

data2 <- "Trait Marker Allele Ef1 Ef2 Ef3 Obs1 Obs2 Obs3
ca S10_17086845 R/A/G 0.000001 3.489820 -0.017141 54 1 389
cf S10_9890328 R/G/A 0.000001 4.367540 -0.010635 146 1 297"
data2 <-read.table(text=data2,header=T)

Thank you

Fpertille
  • 383
  • 1
  • 14
  • 1
    I am trying to follow all the stackoverflow rules so, kindly, I would like to know why It was drawn me a point, so I can improve the next question – Fpertille Aug 25 '16 at 15:57

4 Answers4

1

You can use dplyr to paste alleles and also get first, second and third Effects or Obs.

library(dplyr)
data1 %>% group_by(Trait, Marker) %>% summarize( allele = paste(Allele, collapse="/"), ef1=first(Effect), ef2=nth(Effect,2) , ef3=nth(Effect,3) )

  Trait       Marker allele      ef1     ef2       ef3
1    ca S10_17086845  R/A/G 0.000001 3.48982 -0.017141
2    cf  S10_9890328  R/G/A 0.000001 4.36754 -0.010635
Chris S.
  • 2,185
  • 1
  • 14
  • 14
1

There might be a simpler way, but this combo of dplyr and tidyr works:

library(dplyr)
library(tidyr)
data3 <- data1 %>% 
          group_by(tag) %>% 
          mutate(obstag = paste0("Obs", seq_along(Obs)),  # Add markers
                 eftag = paste0("Ef", seq_along(Effect)),
                 altag = paste0("A", seq_along(Allele))) %>%
          spread(altag, Allele) %>%  # Switch from rows to columns 
          spread(obstag, Obs) %>% 
          spread(eftag, Effect) %>% 
          summarise_each(funs(unique(na.omit(.))), 1:Ef3) %>%  # Collapse into one row per tag 
          mutate(Allele = paste(A1, A2, A3, sep = "/")) %>% # paste alleles together
          select(-A1, -A2, -A3, -tag) # drop unwanted columns
Nick DiQuattro
  • 729
  • 4
  • 7
1

You can try something like this using dplyr and tidyr, where we first remove two irrelevant columns tag and Site which I think you should have done before posting to make the idea more clear; then summarize by Trait, Marker and Locus groups paste the Allele and Effect and Obs together. Finally a separation on columns Effect and Obs get to the result data frame:

library(dplyr); library(tidyr);
data1 %>% select(-tag, -Site) %>% group_by(Trait, Marker, Locus) %>% 
          summarise_each(funs(paste(., collapse = "/"))) %>% 
          separate(Effect, into = paste("Eff", 1:3, sep = ""), sep = "/") %>% 
          separate(Obs, into = paste("Obs", 1:3, sep = ""), sep = "/")

# Source: local data frame [2 x 10]
# Groups: Trait, Marker [2]
# 
#    Trait       Marker Locus Allele     Eff1    Eff2      Eff3  Obs1  Obs2  Obs3
#   (fctr)       (fctr) (int)  (chr)    (chr)   (chr)     (chr) (chr) (chr) (chr)
# 1     ca S10_17086845    10  R/A/G 0.000001 3.48982 -0.017141    54     1   389
# 2     cf  S10_9890328    10  R/G/A 0.000001 4.36754 -0.010635   146     1   297
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

We can do this easily with data.table. Convert the 'data.frame' to 'data.table' (setDT(data1)), create the 'Allele1' column by pasteing 'Allele' together after grouping by 'Trait' and 'Marker'. Then, use dcast to reshape it to 'wide' format. The data.table::dcast is different from the reshape2:dcast as it can also take multiple value.var columns.

library(data.table)#v1.9.7+
setDT(data1)[,  Allele1 := paste(Allele, collapse="/") , .(Trait, Marker)]
dcast(data1, Trait + Marker + Allele1 ~ rowid(Trait), 
                              value.var = c("Effect", "Obs"))
#   Trait       Marker Allele1 Effect_1 Effect_2  Effect_3 Obs_1 Obs_2 Obs_3
#1:    ca S10_17086845   R/A/G 0.000001  3.48982 -0.017141    54     1   389
#2:    cf  S10_9890328   R/G/A 0.000001  4.36754 -0.010635   146     1   297
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you, but I think this command is a little bit heavy to my data... Error: cannot allocate vector of size 11282.6 Gb – Fpertille Jun 23 '16 at 12:04