3

The goal I'm trying to achieve is to take a data frame column which is a factor, create a new column for each level and populate the column with the appropriate value for that level from the original data frame.

Here is a sample. In this case, I want to create a new column for each level of the the.name factor column, like so:

Original dataframe:

symbol        the.name          cn    
SYM1          ABC               1
SYM2          ABC               2
SYM1          DEF               3
SYM2          DEF               4

Resulting dataframe:

symbol       ABC       DEF
SYM1         1         3
SYM2         2         4

How can this be done?


EDIT: I have tried to achieve this using a sapply loop with split by the column and thenrbinding the results. However, I have not gotten it to work and chose not to add it into this question as it would generate noise - I'm pretty sure that method is not correct and can be considerably improved.

Juan Carlos Coto
  • 11,900
  • 22
  • 62
  • 102
  • Curious as to why there's a downvote? Looks like a good question but I could be missing something – Señor O Oct 15 '14 at 20:15
  • The downvoter probably wanted to highlight that this is a very common question. – ilir Oct 15 '14 at 20:16
  • 2
    Not the down-voter, but I presume it's because OP didn't show that they tried anything – Rich Scriven Oct 15 '14 at 20:17
  • 2
    @ilir That was my only thought. It's hard to downvote a question with expected output though :) – Señor O Oct 15 '14 at 20:22
  • I'd love some clarification on the downvote too. @ilir: It's a good point my not showing I tried anything (I did try to do this using a double `sapply` loop but threw up a little in my mouth), so I'll add some clarification. Thanks! – Juan Carlos Coto Oct 15 '14 at 23:54
  • @ilir Regarding this being a common question, I'd really appreciate it if you could point me to where it has been asked and answered before. Even though the answers here look great, it's always good to have more info :). – Juan Carlos Coto Oct 16 '14 at 00:03
  • 1
    You could look at [this question](http://stackoverflow.com/questions/9617348/reshape-three-column-data-frame-to-matrix), or [this one](http://stackoverflow.com/questions/5890584/reshape-data-from-long-to-wide-format-r), or even [this](http://stackoverflow.com/questions/22558677/reshape-panel-data-from-long-to-wide). I think knowing they are called long or wide data is key. Look at `melt()` and `dcast()` help from package `reshape2` for some good examples. – ilir Oct 16 '14 at 08:37

3 Answers3

6

Alternatively, the newish tidyr package provides does this with the "spread" function. Using @ilir's data

> tidyr::spread(tmp, key = the.name, value = cn)
  symbol ABC DEF
1   SYM1   1   3
2   SYM2   2   4
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
5

This is a job for dcast from the package reshape2:

> dcast(df, symbol~the.name, value.var="cn")
  symbol ABC DEF
1   SYM1   1   3
2   SYM2   2   4
Señor O
  • 17,049
  • 2
  • 45
  • 47
5

This is a reshaping task (from long to wide data). The package reshape2 has some great utilities to do this.

txt="symbol        the.name          cn    
      SYM1          ABC               1
      SYM2          ABC               2
      SYM1          DEF               3
      SYM2          DEF               4"

tmp <- read.table(text=txt, header=TRUE)

library(reshape2)
dcast(tmp, symbol ~ the.name)   ## as easy as that
ilir
  • 3,236
  • 15
  • 23