0

I want to pivot a data frame
My input data is as

1st Input

MSFT    AAPL    GOOG
10      20      40

1st output

Id  Symbol  Value
T1  MSFT    10
T1  AAPL    20
T1  GOOG    40

--------------------------------------
--------------------------------------
2nd Input

AAPL    GOOG
30      60

2nd output

Id  Symbol  Value
T2  AAPL    30
T2  GOOG    60

---------------------------------------------------------------------
Input1 and Input2 are generated in a for loop. Can a single line of R code can solve for both of these so that the final output is as follows.

Id  Symbol  Value
T1  MSFT    10
T1  AAPL    20
T1  GOOG    40
T2  AAPL    30
T2  GOOG    60

Performance is must. I have seen on other questions on Stackoverflow that rbind or list provide less performance than preallocated space.

Arun
  • 116,683
  • 26
  • 284
  • 387
Chaturvedi Dewashish
  • 1,469
  • 2
  • 15
  • 39
  • 1
    Have a look at `gather()` from the package `tidyr`. There is also a base R function `reshape` that can do what you want, but `tidyr` is much more user-friendly. – konvas Dec 22 '14 at 14:18
  • the following link may help: http://stackoverflow.com/questions/6778908/r-transposing-a-data-frame – Ruthger Righart Dec 22 '14 at 14:21
  • Are your inputs lists, named vectors, data frames, what? Define them. As it stands, the easiest way to make the 1st and second outputs is to rotate my head 90 degrees.. – Spacedman Dec 22 '14 at 15:33

4 Answers4

4

Here's a way to do it with dplyr and tidyr (as commented by konvas):

library(dplyr)
library(tidyr)

df1 <- gather(df1, Symbol, Value, MSFT:GOOG) %>% mutate(Id = "T1")
df2 <- gather(df2, Symbol, Value, AAPL:GOOG) %>% mutate(Id = "T2")
dfs <- rbind_list(df1, df2)

The same could be written in "a single line of R code" as asked for in the question:

rbind_list(gather(df1, Symbol, Value, MSFT:GOOG) %>% mutate(Id = "T1"),
           gather(df2, Symbol, Value, AAPL:GOOG) %>% mutate(Id = "T2"))

Or using reshape2 and base R's rbind:

library(reshape2)
rbind(transform(melt(df1), Id = "T1"), transform(melt(df2), Id = "T2"))
talat
  • 68,970
  • 21
  • 126
  • 157
  • rbindlist(list(...)) is faster. – Colonel Beauvel Dec 22 '14 at 14:33
  • @ColonelBeauvel, then congrats for implementing it in your answer :) – talat Dec 22 '14 at 14:35
  • Regarding performance, a quote from `rbind_list`: "This is an efficient version of the common pattern of do.call(rbind, dfs) for row-binding many data frames together. It works in the same way as rbind.fill but is implemented in C++ so avoids many copies and is much much faster." – talat Dec 22 '14 at 15:10
4

Syntax-wise this is succinct:

reshape2::melt(list(T1=Input1, T2=Input2))
#   variable value L1
# 1     MSFT    10 T1
# 2     AAPL    20 T1
# 3     GOOG    40 T1
# 4     AAPL    30 T2
# 5     GOOG    60 T2

However it internally uses rbind.fill which doesn't scale well (albeit an old benchmark). There's another interesting link/post I came across while searching for rind.fill slowness.

Community
  • 1
  • 1
Arun
  • 116,683
  • 26
  • 284
  • 387
3

Assuming your orginal data are:

> df1
  MSFT AAPL GOOG
1   10   20   40

Without dplyr and plyr but reshape package (for rename):

toggle <- function(df, id){
    rename(transform(t(df), id=id, Symbol=rownames(t(df)), c("X_data"="Value"))

}

rbindlist(list(toggle(df1,'T1'), toggle(df2,'T2')))
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
  • 1
    indeed that might be the case! Corrected, still the input data are not mentionned in a very clear format in the question ;) – Colonel Beauvel Dec 22 '14 at 15:04
  • 1
    Don't assume these things. Ask the poster for clarification. And vote down as unclear. Its what comments and votes are for! – Spacedman Dec 22 '14 at 15:34
1

Another option would be to use unnest/gather/filter combo from tidyr/dplyr

library(dplyr)
library(tidyr)
unnest(setNames(list(Input1, Input2), paste0("T",1:2)),'Id') %>%
                                   gather(Symbol, Value, -Id) %>% 
                                   filter(!is.na(Value))
#  Id Symbol Value
#1 T1   MSFT    10
#2 T1   AAPL    20
#3 T2   AAPL    30
#4 T1   GOOG    40
#5 T2   GOOG    60

data

 Input1 <- structure(list(MSFT = 10L, AAPL = 20L, GOOG = 40L), .Names =
  c("MSFT", "AAPL", "GOOG"), class = "data.frame", row.names = c(NA, -1L))

 Input2 <- structure(list(AAPL = 30L, GOOG = 60L), .Names = c("AAPL", "GOOG"
 ), class = "data.frame", row.names = c(NA, -1L))
akrun
  • 874,273
  • 37
  • 540
  • 662