0

I've found similar queries listed on here, but none of them have been able to work for me. I have binary data listed in a data frame which I want to aggregate according to another variable. For example.

Data.frame (A & B are columns)
A   B   
1   23
0   7
0   23
0   7
1   4 

I've tried the below (which worked when finding the mean) and get the following error message:

aggregate( A~B, data.frame, sum)

Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument

Ideally I would like an output which gives 23 = 1, 7 = 0, 4 = 1

Can anyone help me please?

Thanks in advance!

Thirst for Knowledge
  • 1,606
  • 2
  • 26
  • 43
  • 1
    You should probably avoid calling your data frame `data.frame` as that's the name of a function. Additionally, as several have suggested, please check the contents of your data frame by typing `str(data.frame)` (assuming you actually called your data frame `data.frame` and making sure the columns are not of `character` type. – BrodieG Jan 13 '14 at 16:15
  • I guess it might be helpful to check `?as.numeric`, `sum(c("1", "2"))`, `sum(as.numeric(c("1", "2")))`, `sum(c(1, 2))`. – alexis_laz Jan 13 '14 at 16:26

2 Answers2

1

Many ways to do this, but for a start:

library(plyr)
foo <- data.frame(A = c(1, 0, 0, 0, 1),
                  B = c(23, 7, 23, 7, 4))

ddply(foo, .(B), summarise, sum = sum(A))

gives:

> ddply(foo, .(B), summarise, sum = sum(A))
   B sum
1  4   1
2  7   0
3 23   1
> 
SlowLearner
  • 7,907
  • 11
  • 49
  • 80
  • Thanks for the suggestion, I'm afraid I received the following error? Error in sum(A) : invalid 'type' (character) of argument – Thirst for Knowledge Jan 13 '14 at 15:50
  • 1
    That's probably because your column B is not a numeric but a character or factor. As the other answer suggests, do a `str(data.frame)` to see what types your columns are and in future add this kind of information to the question. Read [this post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for suggestions on making better questions. – SlowLearner Jan 13 '14 at 16:23
0

What did you call exactly? What is the str of your data.frame?

mdf <- data.frame( A = c(1,0,0,0,1), B = c(23, 7 ,23, 7,4)
aggregate( A ~ B, mdf, sum )

gives

   B A
1  4 1
2  7 0
3 23 1

EDIT:

So just in case your problem is, that your column A is not numeric you can fix this by

mdf$A <- as.numeric( as.character( mdf$A ) )
Beasterfield
  • 7,023
  • 2
  • 38
  • 47
  • Thanks for the suggestion. I followed your code but got the following error? Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument – Thirst for Knowledge Jan 13 '14 at 15:53
  • @ej550 are you saying you are getting this error when you copy, paste and execute the two lines of code in a fresh R session? In this case your R installation is broken. – Beasterfield Jan 13 '14 at 16:37
  • My apologies, the columns were both characters. I've now converted them to numeric. Thanks for your help. – Thirst for Knowledge Jan 13 '14 at 16:52
  • Unfortunately converting the column 'B' to numeric messed up my data because it's a mixture of numbers and characters. How can I sum the numeric variable 'A' according to a character based variable 'B'? Thanks! – Thirst for Knowledge Jan 13 '14 at 18:54
  • @ej550 I didn't write that you should convert your id column, but just value column. What happens if you convert only column `A`? Once more: If you do not provide an example which let us reproduce your error with a copy and paste, we won't be able to help you. – Beasterfield Jan 14 '14 at 08:35