3

I have a data frame df that looks like the following where the gender column is a factor with two levels:

gender    age
m         18
f         14
m         18
m         18
m         15
f         15

I would like to add a new column called count that simply reflects the number of times that gender level appears in the data frame. So, ultimately, the data frame would look like:

gender    age    count
m         18     4
f         14     2
m         18     4
m         18     4
m         15     4
f         15     2

I know that I can do table(df$gender) that gives me the number of times the factor appears, but I do not know how to translate those results into a new column in df. I'm wondering how can I use the table function--or is there a better way to achieve my new column?

whistler
  • 876
  • 2
  • 15
  • 31

5 Answers5

7

You may try ave:

# first, convert 'gender' to class character
df$gender <- as.character(df$gender)

df$count <- as.numeric(ave(df$gender, df$gender, FUN = length))
df
#   gender age count
# 1      m  18     4
# 2      f  14     2
# 3      m  18     4
# 4      m  18     4
# 5      m  15     4
# 6      f  15     2

Update following @flodel's comment - thanks!

df <- transform(df, count = ave(age, gender, FUN = length))

Henrik
  • 65,555
  • 14
  • 143
  • 159
  • I tried this and it populates the count column with NAs. After I run the command I get the message "There were 50 or more warnings (use warnings() to see the first 50)". The warnings look like: "In `[<-.factor`(`*tmp*`, i, value = 2L) : invalid factor level, NAs generated". I double checked the column and it is in fact a factor. – whistler Nov 28 '13 at 22:13
  • Sorry, I should have read your question more carefully. If you convert 'gender' to a character it works. I edit my question. Cheers. – Henrik Nov 28 '13 at 22:17
  • You can do `dat <- transform(dat, count = ave(age, gender, FUN = length))` and not have to modify the gender class. – flodel Nov 28 '13 at 22:21
  • @flodel, thanks a lot for your comment! I rarely use `transform` - time to start it seems! Cheers. – Henrik Nov 28 '13 at 22:22
  • Well, `transform` is just to make things pretty. The problem you were having with `gender` being a factor is handled by not using `gender` but `age` as the first argument to `ave`. – flodel Nov 28 '13 at 22:24
  • @flodel - I misread the question, then your code. Sorry. Not my day... – Henrik Nov 28 '13 at 22:28
7

Since gender is a factor, you can use it to index the table output:

dat$count <- table(dat$gender)[dat$gender]

Or to avoid repeating dat$ too many times:

dat <- transform(dat, count = table(gender)[gender])
flodel
  • 87,577
  • 21
  • 185
  • 223
1

Using plyr:

library(plyr) 
ddply(dat,.(gender),transform,count=length(age))
  gender age count
1      f  14     2
2      f  15     2
3      m  18     4
4      m  18     4
5      m  18     4
6      m  15     4
agstudy
  • 119,832
  • 17
  • 199
  • 261
1

And a data.table version for good measure.

library(data.table)
df <- as.data.table(df)

Once you have the data.table, it's then a simple operation:

df[,count := .N,by="gender"]
df

#   gender age count
#1:      m  18     4
#2:      f  14     2
#3:      m  18     4
#4:      m  18     4
#5:      m  15     4
#6:      f  15     2
thelatemail
  • 91,185
  • 12
  • 128
  • 188
0

You can set the counts and then do something like this, but that's not exactly elegant.

m.cnt <- length(which(df$gender == "m"))
f.cnt <- length(which(df$gender == "f"))

df$count <- NA
df$count[which(df$gender == "m")] <- m.cnt
df$count[which(df$gender == "f")] <- f.cnt

Alternatively you can use plyr but this results in recalculating the same thing over and over again, which might not be worth it since you only have 2 factors.

Frank P.
  • 503
  • 5
  • 21
  • This is a good solution but is not practical for me. I used gender as an example, but the factor I'm trying to count actually has > 1000 levels. – whistler Nov 28 '13 at 22:15