1

In R (or S-PLUS), what is a good way to aggregate String data in a data frame?

Consider the following:

myList <- as.data.frame(c("Bob", "Mary", "Bob", "Bob", "Joe"))

I would like the output to be:

 [Bob,  3
  Mary, 1
  Joe,  1]

Currently, the only way I know how to do this is with the summary function.

> summary(as.data.frame(myList))

 Bob :3                                
 Joe :1                                
 Mary:1      

This feels like a hack. Can anyone suggest a better way?

Frank V
  • 25,141
  • 34
  • 106
  • 144
Ryan Guest
  • 6,080
  • 2
  • 33
  • 39

5 Answers5

2

This is a combination of the above answers (as suggested by Thierry)

data.frame(table(myList[,1]))

which gives you

  Var1 Freq
1  Bob    3
2  Joe    1
3 Mary    1
andrewj
  • 2,965
  • 8
  • 36
  • 37
  • It gives an error for me - a one liner based on Thierry's suggestion would be: as.data.frame(table(myList)) – bubaker Jul 26 '09 at 06:59
  • 1
    That's interesting. What kind of error message did you get? I just tried it without getting an error message. – andrewj Jul 26 '09 at 18:14
  • Scratch that - I tried it after defining myList as a list, not a data.frame – bubaker Jul 29 '09 at 07:16
2

Using table, no need to sort:

ctable <- table(myList);
counts <- data.frame(Name = names(ctable),Count = as.vector(ctable));
bubaker
  • 2,279
  • 1
  • 18
  • 13
  • 2
    you can simplify the last line to as.data.frame(ctable) Note that the semicolons are only needed if you put more than one command on a line. – Thierry Jul 24 '09 at 11:39
1

Using data.table

myList <- data.frame(v1=c("Bob", "Mary", "Bob", "Bob", "Joe"))
library(data.table)
     v1 N
1:  Bob 3
2: Mary 1
3:  Joe 1
Chriss Paul
  • 1,101
  • 6
  • 19
1

Do you mean like this?

myList <- c("Bob", "Mary", "Bob", "Bob", "Joe")
r <- rle(sort(myList))
result <- as.data.frame(cbind(r$values, r$lengths))
names(result) <- c("Name", "Occurrences")
result
  Name Occurrences
1  Bob           3
2  Joe           1
3 Mary           1
vrajs5
  • 4,066
  • 1
  • 27
  • 44
Jouni K. Seppänen
  • 43,139
  • 5
  • 71
  • 100
0

Using sqldf library:

require(sqldf)

myList<- data.frame(v=c("Bob", "Mary", "Bob", "Bob", "Joe"))
sqldf("SELECT v,count(1) FROM myList GROUP BY v")
OmG
  • 18,337
  • 10
  • 57
  • 90