17

I'm not even sure how to title the question properly!

Suppose I have a dataframe d:

Current dataframe:

d <- data.frame(sample = LETTERS[1:2], cat = letters[11:20], count = c(1:10))

   sample cat count
1       A   k     1
2       B   l     2
3       A   m     3
4       B   n     4
5       A   o     5
6       B   p     6
7       A   q     7
8       B   r     8
9       A   s     9
10      B   t    10

and I'm trying to re-arrange things such that each cat value becomes a column of its own, sample remains a column (or becomes the row name), and count will be the values in the new cat columns, with 0 where a sample doesn't have a count for a cat. Like so:

Desired dataframe layout:

   sample   k   l   m   n   o   p   q   r   s   t
1       A   1   0   3   0   5   0   7   0   9   0
2       B   0   2   0   4   0   6   0   8   0  10

What's the best way to go about this?

This is as far as I've gotten:

for (i in unique(d$sample)) {
    s <- d[d$sample==i,]
    st <- as.data.frame(t(s[,3]))
    colnames(st) <- s$cat
    rownames(st) <- i
} 

i.e. looping through the samples in the original data frame, and transposing for each sample subset. So in this case I get

   k m o q s
 A 1 3 5 7 9

and

   l n p r  t
 B 2 4 6 8 10

And this is where I get stuck. I've tried a bunch of things with merge, bind, apply,... but I can't seem to hit on the right thing. Plus, I can't help but wonder if that loop above is a necessary step at all - something with unstack perhaps?

Needless to say, I'm new to R... If someone can help me out, it would be greatly appreciated!

PS Reason I'm trying to re-arrange my dataframe is in the hopes of making plotting of the values easier (i.e. I want to show the actual df in a plot in table format).

Thank you!

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
crs
  • 323
  • 1
  • 2
  • 8
  • The operation you want to do is go from "long" to "wide". This question has been asked and answered many times before. – nograpes Oct 13 '13 at 14:10
  • 1
    My apologies. I did search endlessly through here and google, I just couldn't find anything quite like what I need (not knowing exactly how to call what I need is not helping). I'll have a look for "long"/"wide"... – crs Oct 13 '13 at 14:15

3 Answers3

17

Use dcast from reshape2 package

> dcast(d, sample~cat, fill=0)
  sample k l m n o p q r s  t
1      A 1 0 3 0 5 0 7 0 9  0
2      B 0 2 0 4 0 6 0 8 0 10

xtabs from base is another alternative

> xtabs(count~sample+cat, d)
      cat
sample  k  l  m  n  o  p  q  r  s  t
     A  1  0  3  0  5  0  7  0  9  0
     B  0  2  0  4  0  6  0  8  0 10

If you prefer the output to be a data.frame, then try:

> as.data.frame.matrix(xtabs(count~sample+cat, d))
  k l m n o p q r s  t
A 1 0 3 0 5 0 7 0 9  0
B 0 2 0 4 0 6 0 8 0 10
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • He, this looks neat. However, I should have mentioned, I'm looking for a way to do this in standard (base) R... (as this isn't for me). Any way to do that? (even if it's more involved?) – crs Oct 13 '13 at 14:13
  • 1
    Oh, bloody h, thank you so much!!!! This is it exactly! I can't vote up yet, but bless your heart :) – crs Oct 13 '13 at 14:18
  • 2
    @crs, I just +1d your question so you're at 16 rep now. You can vote up both answers now :) – A5C1D2H2I1M1N2O1R2T1 Oct 13 '13 at 17:58
12

Using reshape from base R:

nn<-reshape(d,timevar="cat",idvar="sample",direction="wide")
names(nn)[-1]<-as.character(d$cat)
nn[is.na(nn)]<-0
> nn
  sample k l m n o p q r s  t
1      A 1 0 3 0 5 0 7 0 9  0
2      B 0 2 0 4 0 6 0 8 0 10
Metrics
  • 15,172
  • 7
  • 54
  • 83
  • Thanks, @Metrics - that works too! (and I now understand what nograpes was referring to with "long" and "wide" in his comment!). Jilber's solution looks more concise - what's the difference otherwise? Thank you for help! Always good to learn sth extra. – crs Oct 13 '13 at 14:36
  • 1
    @crs, the big difference is that both `xtabs` and `dcast` can really only have one "value" variable to convert to a wide format. As an example, try the first line of Metrics's code after making one small change to your original "d" `data.frame`: `d$blah <- 11:20`. Now, try to do the same with `dcast` or `xtabs`. With `dcast`, similar output would be possible after first `melt`ing your data. With `xtabs` you would have to reshape the variables separately and `cbind` them together. And `reshape()` is pretty fast, especially in comparison to the current implementation of `dcast`. – A5C1D2H2I1M1N2O1R2T1 Oct 13 '13 at 17:45
  • 1
    Metrics: +1, but again, I don't know if I would go to the trouble of renaming. In this case, because of the extended example I describe in the comment above... – A5C1D2H2I1M1N2O1R2T1 Oct 13 '13 at 17:56
  • @AnandaMahto - Found out about the first difference the hard way - without any understanding, until now (figured it was just me!) So many thanks for that explanation! And everyone gets an upvote now - thank you all. – crs Oct 14 '13 at 00:17
0

Here is a tidyverse solution that might be of interest too:

library(tidyr)

d %>%
  pivot_wider(names_from = cat, 
              values_from = count, 
              values_fill = 0)

# A tibble: 2 x 11
  sample     k     l     m     n     o     p     q     r     s     t
  <chr>  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 A          1     0     3     0     5     0     7     0     9     0
2 B          0     2     0     4     0     6     0     8     0    10
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41