2

I faced a problem while trying to re-arrange by data frame into long format. my table looks like this:

x <- data.frame("Accession"=c("AGI1","AGI2","AGI3","AGI4","AGI5","AGI6"),"wt_rep_1"=c(1,2,3,4,4,5), "wt_rep_2" = c(1,2,3,4,8,9), "mutant1_rep_1"=c(1,1,0,0,5,3), "mutant2_rep_1" = c(1,7,0,0,1,5), "mutant2_rep_2" = c(1,1,4,0,1,8) )

> x
  Accession wt_rep_1 wt_rep_2 mutant1_rep_1 mutant2_rep_1 mutant2_rep_2
1      AGI1        1        1             1             1             1
2      AGI2        2        2             1             7             1
3      AGI3        3        3             0             0             4
4      AGI4        4        4             0             0             0
5      AGI5        4        8             5             1             1
6      AGI6        5        9             3             5             8

I need to create a column that I would name "genotype", and it would containt the first part of the name of the column before "_" How to use strsplit(names(x), "_") for that? and preferably loop... please, anyone, help.

tralala
  • 113
  • 13

2 Answers2

2

I'll extract the part of the column names of x before the first _ in two instructions. Note that it can be done in just one line, but I'm posting like this for clarity.

sp <- strsplit(names(x), "_")
sapply(sp[-1], `[`, 1)

Now, how can this be a new column in data.frame x? There are only five elements in the resulting vector and x has six rows.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

I agree with Ruy Barradas: I don't get how this vector could be a part of your original dataframe. Could you please clarify?

William Doane's response to this question suggests that using regular expressions might do the trick. I like this approach because I find it elegant and fast:

  > gsub("(_.*)$", "", names(x))[-1]
  [1] "wt"      "wt"      "mutant1" "mutant2" "mutant2"
  • `> x_long Accession genotype replicate value 1 AGI1 wt rep1 1 2 AGI1 wt rep2 2 3 AGI1 mutant1 rep1 3 4 AGI1 mutant1 rep2 4` long format is what I want finally achive. Thank you very much for your tips! I meant, in long format it's still the same table but transposed for later simplest navigation and use. More suggestions will be very, very wellcome. – tralala Jul 29 '17 at 19:09
  • sorry, I have formatting problem x_long <- data.frame ("Accession" = c("AGI1", "AGI1", "AGI1", "AGI1"),"genotype" = c("wt", "wt", "mutant1", "mutant1"), "replicate" = c("rep1", "rep2", "rep1", "rep2"), "value" = c(1,2,3,4)) > x_long – tralala Jul 29 '17 at 19:10