8

This may seem like a very basic R question, but I'd appreciate an answer. I have a data frame in the form of:

col1    col2
a   g
a   h
a   g
b   i
b   g
b   h
c   i

I want to transform it into counts, so the outcome would be like this. I've tried using table () function, but seem to only be able to get the count for one column.

    a   b   c
g   2   1   0
h   1   1   0
i   0   1   1

How do I do it in R?

Michele
  • 8,563
  • 6
  • 45
  • 72
aa762
  • 89
  • 1
  • 1
  • 3

2 Answers2

9

I'm not really sure what you used, but table works fine for me!

Here's a minimal reproducible example:

df <- structure(list(V1 = c("a", "a", "a", "b", "b", "b", "c"), 
                     V2 = c("g", "h", "g", "i", "g", "h", "i")), 
                .Names = c("V1", "V2"), class = "data.frame", 
                row.names = c(NA, -7L))
table(df)
#    V2
# V1  g h i
#   a 2 1 0
#   b 1 1 1
#   c 0 0 1

Notes:

  • Try table(df[c(2, 1)]) (or table(df$V2, df$V1)) to swap the rows and columns.
  • Use as.data.frame.matrix(table(df)) to get a data.frame as your output. (as.data.frame will create a long data.frame, not one in the same output format you desire).
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • 1
    Or `table(df$V1, df$V2)` – csgillespie Sep 19 '13 at 12:54
  • 1
    @csgillespie, but I like that when you use `table` directly, you get additional labels from the variable names. You can get the same with your approach by specifying the desired names (eg: `table(V1 = df$V1, V2 = df$V2)`) but I like to save the typing when I can :) – A5C1D2H2I1M1N2O1R2T1 Sep 19 '13 at 13:00
  • and using `[` you can choose columns programmatically with variables storing columns names. – Michele Sep 19 '13 at 13:03
  • I didn't mean to imply it was "better", just it was another way. You can imagine a future SO question where "I have two vectors..." – csgillespie Sep 19 '13 at 13:04
  • 2
    @AnandaMahto LOL +7 for an answer consisting of `table(df)` when [**this**](http://stackoverflow.com/a/18797893/1478381) gets +1 (from me). I really don't get SO voting sometimes. – Simon O'Hanlon Sep 19 '13 at 14:41
  • @AnandaMahto now it's 2 :-) pretty good answer! – Michele Sep 19 '13 at 16:13
4

Using f from @Ananda you can use dcast

library(reshape2)

> dcast(f, V1~V2)
Using V2 as value column: use value.var to override.
Aggregation function missing: defaulting to length
  V1  g  h  i
1 a   2  1  0
2 b   1  1  1
3 c   0  0  1

However, I'm writing this only in case you may need something more than just table (which for this case it's the simplest correct answer) in the future, like:

set.seed(1)
f$var <- rnorm(7)

> f
  V1 V2        var
1 a   g -0.6264538
2 a   h  0.1836433
3 a   g -0.8356286
4 b   i  1.5952808
5 b   g  0.3295078
6 b   h -0.8204684
7 c   i  0.4874291

> dcast(f, V1~V2, value.var="var", fun.aggregate=sum)
  V1          g          h         i
1 a  -1.4620824  0.1836433 0.0000000
2 b   0.3295078 -0.8204684 1.5952808
3 c   0.0000000  0.0000000 0.4874291
Michele
  • 8,563
  • 6
  • 45
  • 72