How to add a column in R of the item's length in another column

Question

Suppose I have a data frame such like:

set.seed(123)
df<-data.frame(y=sample( c("A","B","C"), 10, T), 
                X=sample(c (1,2,3), 10, T))
   y X
1  A 3
2  C 2
3  B 3
4  C 2
5  C 1
6  A 3
7  B 1
8  C 1
9  B 1
10 B 3

what I wanted is to add a column z which summarize the items' length of column y such as:

which means there are 2 As, 4 Cs and 4 Bs.

akrun · Accepted Answer · 2015-08-26T13:13:51.233

We can use data.table to create the column 'z' based on the number of elements (.N) for each 'y'.

library(data.table)
DT <- as.data.table(df)
DT[, z:= .N, by = y]
DT
#    y X z
# 1: A 3 2
# 2: C 2 4
# 3: B 3 4
# 4: C 2 4
# 5: C 1 4
# 6: A 3 2
# 7: B 1 4
# 8: C 1 4
# 9: B 1 4
#10: B 3 4

Or using dplyr, we group by 'y' and create a new column 'z' with mutate. The dplyr equivalent to .N is n().

library(dplyr)
df %>%
   group_by(y) %>%
   mutate(z = n())

Pierre L · Answer 2 · 2015-08-26T13:51:58.133

2

df$z=table(df$y)[df$y]
df
#    y X z
# 1  A 3 2
# 2  C 2 4
# 3  B 3 4
# 4  C 2 4
# 5  C 1 4
# 6  A 3 2
# 7  B 1 4
# 8  C 1 4
# 9  B 1 4
# 10 B 3 4

With table we are able to get both the counts and the names of each element of the df$y column. So that saves steps along the way. We are leveraging the strength of being able to both subset by indices and names. In this case, the column is of the class factor, but the above will also work if they were as.character.

edited Aug 26 '15 at 13:51

answered Aug 26 '15 at 13:28

Pierre L

28,203
6
47
69

I'm not sure regarding the names. Try `as.vector(table(df$y))[df$y]`, for example. `as.vector` strips the names. In this case it seem to work due to the fact that `df$y` is an integer (factor) in this example. Though when `y` is of class `character` it seem to work according to names. Quite tricky. – David Arenburg Aug 26 '15 at 13:44
Yes @DavidArenburg , the factor is working underneath. But something cool is that even if the strings were characters the name subset would work but not the value. `table(df$y)[as.character(df$y)]` will still work due to name subsetting. But `as.vector(table(df$y))[as.character(df$y)]` won't. – Pierre L Aug 26 '15 at 13:49
Yes, I just wrote it in my comment above :). Worth investigation :) – David Arenburg Aug 26 '15 at 13:52
@DavidArenburg Just saw the edit. Flexible subsetting is the type of R feature that keeps people interested and puts R above other platforms. That along with matrix operations really show the power of the language :) – Pierre L Aug 26 '15 at 13:55
Though it may introduce some inconsistency... – David Arenburg Aug 26 '15 at 13:57
@DavidArenburg how so? – Pierre L Aug 26 '15 at 14:02
Because some types it operates over names of the table and sometimes over its values. I'm not sure its very consistent. – David Arenburg Aug 26 '15 at 14:03

RHertel · Answer 3 · 2015-08-26T13:09:56.340

1

Here's a simple approach using a for loop:

for (i in levels(df$y)) df$z[df$y==i] <- sum(df$y==i)  
#> df
#   y X z
#1  A 3 2
#2  C 2 4
#3  B 3 4
#4  C 2 4
#5  C 1 4
#6  A 3 2
#7  B 1 4
#8  C 1 4
#9  B 1 4
#10 B 3 4

edited Aug 26 '15 at 13:09

answered Aug 26 '15 at 12:53

RHertel

23,412
5
38
64

2

With base I would just go with `with(df, ave(X, y, FUN = length))` – David Arenburg Aug 26 '15 at 13:09
Or maybe `with(df, ave(as.numeric(y), y, FUN = length))` for more consistency or `with(df, ave(seq(y), y, FUN = length))` – David Arenburg Aug 26 '15 at 13:15

How to add a column in R of the item's length in another column

3 Answers3