Add column with counts of another

Question

I have a data frame df that looks like the following where the gender column is a factor with two levels:

gender    age
m         18
f         14
m         18
m         18
m         15
f         15

I would like to add a new column called count that simply reflects the number of times that gender level appears in the data frame. So, ultimately, the data frame would look like:

gender    age    count
m         18     4
f         14     2
m         18     4
m         18     4
m         15     4
f         15     2

I know that I can do table(df$gender) that gives me the number of times the factor appears, but I do not know how to translate those results into a new column in df. I'm wondering how can I use the table function--or is there a better way to achieve my new column?

Henrik · Accepted Answer · 2013-11-28T22:26:05.220

7

You may try ave:

# first, convert 'gender' to class character
df$gender <- as.character(df$gender)

df$count <- as.numeric(ave(df$gender, df$gender, FUN = length))
df
#   gender age count
# 1      m  18     4
# 2      f  14     2
# 3      m  18     4
# 4      m  18     4
# 5      m  15     4
# 6      f  15     2

Update following @flodel's comment - thanks!

df <- transform(df, count = ave(age, gender, FUN = length))

edited Nov 28 '13 at 22:26

answered Nov 28 '13 at 22:07

Henrik

65,555
14
143
159

I tried this and it populates the count column with NAs. After I run the command I get the message "There were 50 or more warnings (use warnings() to see the first 50)". The warnings look like: "In `[<-.factor`(`*tmp*`, i, value = 2L) : invalid factor level, NAs generated". I double checked the column and it is in fact a factor. – whistler Nov 28 '13 at 22:13
Sorry, I should have read your question more carefully. If you convert 'gender' to a character it works. I edit my question. Cheers. – Henrik Nov 28 '13 at 22:17
You can do `dat <- transform(dat, count = ave(age, gender, FUN = length))` and not have to modify the gender class. – flodel Nov 28 '13 at 22:21
@flodel, thanks a lot for your comment! I rarely use `transform` - time to start it seems! Cheers. – Henrik Nov 28 '13 at 22:22
Well, `transform` is just to make things pretty. The problem you were having with `gender` being a factor is handled by not using `gender` but `age` as the first argument to `ave`. – flodel Nov 28 '13 at 22:24
@flodel - I misread the question, then your code. Sorry. Not my day... – Henrik Nov 28 '13 at 22:28

flodel · Answer 2 · 2013-11-28T23:56:44.933

7

Since gender is a factor, you can use it to index the table output:

dat$count <- table(dat$gender)[dat$gender]

Or to avoid repeating dat$ too many times:

dat <- transform(dat, count = table(gender)[gender])

edited Nov 28 '13 at 23:56

answered Nov 28 '13 at 23:50

flodel

87,577
21
185
223

score 1 · Answer 3 · answered Nov 28 '13 at 22:17

1

Using plyr:

library(plyr) 
ddply(dat,.(gender),transform,count=length(age))
  gender age count
1      f  14     2
2      f  15     2
3      m  18     4
4      m  18     4
5      m  18     4
6      m  15     4

answered Nov 28 '13 at 22:17

agstudy

119,832
17
199
261

score 1 · Answer 4 · answered Nov 28 '13 at 23:04

And a data.table version for good measure.

library(data.table)
df <- as.data.table(df)

Once you have the data.table, it's then a simple operation:

df[,count := .N,by="gender"]
df

#   gender age count
#1:      m  18     4
#2:      f  14     2
#3:      m  18     4
#4:      m  18     4
#5:      m  15     4
#6:      f  15     2

score 0 · Answer 5 · answered Nov 28 '13 at 22:05

0

You can set the counts and then do something like this, but that's not exactly elegant.

m.cnt <- length(which(df$gender == "m"))
f.cnt <- length(which(df$gender == "f"))

df$count <- NA
df$count[which(df$gender == "m")] <- m.cnt
df$count[which(df$gender == "f")] <- f.cnt

Alternatively you can use plyr but this results in recalculating the same thing over and over again, which might not be worth it since you only have 2 factors.

answered Nov 28 '13 at 22:05

Frank P.

503
5
21

This is a good solution but is not practical for me. I used gender as an example, but the factor I'm trying to count actually has > 1000 levels. – whistler Nov 28 '13 at 22:15

Add column with counts of another

5 Answers5

Linked

Related