1

I have a very large dataset that I need to reshape from wide to long.

Here is a demo of my datset which contains all the situation:

genename    case1   case2   case3   strand
TP53            1       0       1      pos
TNN             0       0       1      pos
CD13            0       0       0      pos
AP35            1       1       1      neg

And the case will be only kept and reshape to longitudinal when an 1 exist, just like the following:

genename    case    strand
TP53       case1       pos
TP53       case3       pos
TNN        case3       pos
AP35       case1       neg
AP35       case2       neg
AP35       case3       neg

How could I process this kind of reshape in R?

scopchanov
  • 7,966
  • 10
  • 40
  • 68
Sugus
  • 59
  • 6
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Sep 13 '18 at 09:37
  • Can we also see what you have tried? – Sotos Sep 13 '18 at 09:42
  • 1
    Possible duplicate of [Reshaping data.frame from wide to long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) – divibisan Sep 14 '18 at 21:06

1 Answers1

0

tidyverse

df <- read.table(text="genename    case1   case2   case3   strand
TP53            1       0       1      pos
TNN             0       0       1      pos
CD13            0       0       0      pos
AP35            1       1       1      neg", header =T)

library(tidyverse)

df %>% 
  gather( case, case_value, c(case1, case2, case3) ) %>%
  filter( case_value == 1 )

#   genename strand  case case_value
# 1     TP53    pos case1          1
# 2     AP35    neg case1          1
# 3     AP35    neg case2          1
# 4     TP53    pos case3          1
# 5      TNN    pos case3          1
# 6     AP35    neg case3          1

data.table

library(data.table)
data.table::melt( setDT(df), id.vars = c("genename", "strand"), measure.vars = c("case1", "case2", "case3") )[value == 1, ][]

#    genename strand variable value
# 1:     TP53    pos    case1     1
# 2:     AP35    neg    case1     1
# 3:     AP35    neg    case2     1
# 4:     TP53    pos    case3     1
# 5:      TNN    pos    case3     1
# 6:     AP35    neg    case3     1

benchmarks

microbenchmark::microbenchmark(
tidyverse = { df %>% 
  gather( case, case_value, c(case1, case2, case3) ) %>%
  filter( case_value == 1 )},
data.table = { melt( setDT(df), id.vars = c("genename", "strand"), measure.vars = c("case1", "case2", "case3") )[value == 1, ][] },
times = 1000)

# Unit: milliseconds
#       expr      min       lq     mean   median       uq      max neval
# tidyverse 2.335393 2.569323 3.157647 2.737729 3.089605 29.29513  1000
# data.table 1.374062 1.551656 1.845519 1.676229 1.838309 28.23499  1000      
Wimpel
  • 26,031
  • 1
  • 20
  • 37