Restructuring dataframe from multiple columns of factors to one column containing factors

Question

Hello my dataframe currently looks like this:

Clone ID    Sequence    Factor_1    Factor_2    Factor_3

clonename   seq_data       1            5          4

clonename2  seq_data2      2            1          3

How to restructure dataframe efficiently so that:

Clone ID    Sequence    Factor    Factor_population

clonename   seq_data   Factor_1          1

clonename   seq_data   Factor_2          5

...

clonename2  seq_data2  Factor_3          3

*edited the table format for clarity. My first question on StackOverflow and I tried my best to present the question clearly so apologies for those kind enough to try help solving this problem.

www · Accepted Answer · 2018-04-14T08:33:22.810

We can use the gather function from the tidyr. starts_with is a function from dplyr that can select the columns start with a string.

library(dplyr)
library(tidyr)

dat2 <- dat %>%
  gather(Factor, Factor_population, starts_with("Factor_"))

dat2
#     Clone.ID  Sequence   Factor Factor_population
# 1  clonename  seq_data Factor_1                 1
# 2 clonename2 seq_data2 Factor_1                 2
# 3  clonename  seq_data Factor_2                 5
# 4 clonename2 seq_data2 Factor_2                 1
# 5  clonename  seq_data Factor_3                 4
# 6 clonename2 seq_data2 Factor_3                 3

DATA

dat <- read.table(text = "'Clone ID'    Sequence    Factor_1    Factor_2    Factor_3

clonename   seq_data       1            5          4

clonename2  seq_data2      2            1          3",
                  header = TRUE, stringsAsFactors = FALSE)

Thank you for your answer and also link to the other solutions you provided! I didn't know the type of data I was dealing with was in the "wide" format. Learned a lot! — 404, Apr 14 '18 at 09:36

Restructuring dataframe from multiple columns of factors to one column containing factors

1 Answers1