1

Sorry if I'm asking something obvious, but couldn't find anything similar.

Suppose I have this data:

a<-c('blue','blue','green','red','black',
     'white','blue','blue','blue','red',
     'black','white','blue','green','red',
     'black','white','white','black','white',
     'blue','white','blue','green')

and would like to have it in a data frame, with a column that summarizes the number of times each element appears in the whole vector, no matter if it's redundant. Something like this:

data.frame(a=c('blue','blue','green','red',
               'black','white','blue','blue',
               'blue','red','black','white',
               'blue','green','red','black',
               'white','white','black','white',
               'blue','white','blue','green'),
           b=c(8,8,3,3,4,6,8,8,8,3,4,
               6,8,3,3,4,6,6,4,6,8,6,8,3))

Any help would be appreciated.

Jaap
  • 81,064
  • 34
  • 182
  • 193
JoseRamon
  • 87
  • 8
  • possible duplicate of [What is the difference between the functions tapply and ave?](http://stackoverflow.com/questions/22289258/what-is-the-difference-between-the-functions-tapply-and-ave) – mnel Mar 10 '14 at 01:49

2 Answers2

2

In the spirit of the question linked by mnel, here's how to do this with ave:

data.frame(a, b=ave(seq_along(a), a, FUN=length))
       a b
1   blue 8
2   blue 8
3  green 3
4    red 3
5  black 4
6  white 6
7   blue 8
8   blue 8
9   blue 8
10   red 3
11 black 4
12 white 6
13  blue 8
14 green 3
15   red 3
16 black 4
17 white 6
18 white 6
19 black 4
20 white 6
21  blue 8
22 white 6
23  blue 8
24 green 3

This uses a bogus numeric vector, and just takes the length for each value.

It might make more sense to take a vector of 1's and take the sum:

data.frame(a, b=ave(rep(1, length(a)), a, FUN=sum))

The result is the same.

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
1

Calculate frequencies

counts<-table(a)

Turn it into a data.frame

df<-as.data.frame(counts)

For each row in df, repeat it Freq times

df2 <- sapply(1:nrow(df), 
   function(x) df[rep(x, df$Freq[x]), ],simplify = FALSE)

Convert the list of data frames into one data frame

df3<-do.call("rbind", df2)

df3
    a Freq
black    4
black    4
black    4
black    4
 blue    8
 blue    8
 blue    8
 blue    8
 blue    8
 blue    8
 blue    8
 blue    8
green    3
green    3
green    3
  red    3
  red    3
  red    3
white    6
white    6
white    6
white    6
white    6
white    6