1

I am trying to add a column to a dataframe that has the dimension of a portion of another dataframe using the following code.

library(dplyr)

dfa <- data.frame(x=c(1,2,3,4,5), sym=c("a","a","b","b","b"))
dfa <- dfa %>% group_by(sym)

dfb <- data.frame( sym = c("a", "b") )
dfb %>% mutate( len = dim( dfa[ dfa[["sym"]]==sym , ] )[1] )

This code give the unintended output and warning message:

  sym len
1   a   2
2   b   2
Warning message:
Problem while computing `len = dim(dfa[dfa[["sym"]] == sym, ])[1]`.
ℹ longer object length is not a multiple of shorter object length 

The output I want is (I must use mutate)

  sym len
1   a   2
2   b   3

Any suggestions?

M C
  • 21
  • 2
  • Try `dfa %>% count(sym, name = 'len')` or `dfa %>% group_by(sym) %>% summarise(len = n())` – Darren Tsai Aug 04 '22 at 07:55
  • The issue results from the comparison with `==`. You are trying to compare a vector of length 5 to a vector of length 2. That is not possible. Usually, this error message means that you actually want to use `%in%` and not `==`. – Roland Aug 04 '22 at 08:02
  • I must use ```mutate```. Is there a solution that makes the variable sym behave like the column value (not the entire column itself) as mutate usually does? For example, like ```dfb %>% mutate(z = str_length(sym) )``` which outputs 1's. – M C Aug 04 '22 at 08:15
  • why do you have to use mutate? – gaut Aug 04 '22 at 08:49

1 Answers1

0

You can use

dfb %>% group_by(sym) %>% mutate(len=sum(sym == dfa$sym))
  sym     len
  <chr> <int>
1 a         2
2 b         3

The key is to group dfb by sym as well.

gaut
  • 5,771
  • 1
  • 14
  • 45