2

I know a lot has been posted on getting frequency counts based on two columns, and it is working for me in a way - except for the fact that it takes one of my variables and makes those values into row numbers. I'd like to just have 'normal' row numbers and have both variables as features in a dataframe.

Data example looks as follows:

sport        cell
football     A1
tennis       A2
tennis       A1
gym          A3

What I'd like to achieve is a count of the number of sports per cell:

    cell    tennis    football    gym
1   A1      1         1           0
2   A2      1         0           0
3   A3      0         0           1

Using the following code, it makes the cell feature into row names:

data.frame(table(data$cell, data$sport)[,]) 

        tennis    football    gym
A1      1         1           0
A2      1         0           0
A3      0         0           1

Also, I've tried to assign numbers to each row and table the three features as follows:

data.frame(table(data$cell, data$sport, data$rownumber)[,]) 

But that just gives me an error.

Any help would be highly appreciated!

zoekdestep
  • 99
  • 1
  • 7
  • `res <- data.frame(table(data$cell, data$sport)[,]); res$cell <- row.names(res)` ? – digEmAll Apr 23 '16 at 11:52
  • does not work for me, unfortunately.. – zoekdestep Apr 23 '16 at 11:56
  • 1
    Why? Do you want `cell` as first column and numeric rownames ? Just do: `x <- data.frame(table(data$cell, data$sport)[,])` then `x <- cbind(cell=rownames(x),x)` and finally `rownames(x) <- 1:nrow(x)` – digEmAll Apr 23 '16 at 12:04
  • Aah that worked perfectly, thanks so much! The first one showed the same as in my example, but the latter indeed worked. What did you exactly do with x <- cbind(cell=rownames(x),x)? – zoekdestep Apr 23 '16 at 12:19
  • 1
    `cbind` stands for column-bind in other words it concatenates by column vectors and data.frames. `rownames` returns a vector with the row names of a data.frame and using `cell=` we give the name to the concatenated column... – digEmAll Apr 23 '16 at 12:24

1 Answers1

4

You could also use the dcast function from the reshape2 package to get the desired result:

library(reshape2)
dat.new <- dcast(dat, cell ~ sport, fun.aggregate = length)

this wil result in the following dataframe:

> dat.new
  cell football gym tennis
1   A1        1   0      1
2   A2        0   0      1
3   A3        0   1      0

An extended and optimized dcast function is also available in the data.table package.


A dplyr/tidyr alternative:

library(dplyr)
library(tidyr)

dat.new <- dat %>%
  group_by(cell, sport) %>%
  tally() %>%
  spread(sport, n, fill = 0)

giving you the same result.

h3rm4n
  • 4,126
  • 15
  • 21