2

I am looking at population data and want to make sure I have enough observations do to county level analysis. Therefore I would like to generate a variable that assigns each observation the number of observations with the same value for the "county" row.

I want to assign each row in my data frame ("cps") a new variable ("freq") which represents the frequency of its specific value in one specific variable ("county"). I used

f <- function(x)sum(with(cps, county==x))

to generate a function that tells me how often a given county x appears in the data. Now I want to use

cps <- mutate(cps, freq=f(county))

to assign each row the number of times its county value appears in the data frame. However, it assigns each row with the overall number of observations.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
Tilman
  • 31
  • 3
  • 1
    hi, not sure if this might be of interest https://stackoverflow.com/questions/63973598/why-doesnt-mutate-function-generate-variable – jspcal Nov 17 '22 at 19:43
  • Do you mind sharing a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – Martin Gal Nov 17 '22 at 19:48
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 17 '22 at 19:48
  • 1
    Your function is not a very generalizable function because `cps` is hardcoded into it. And it's not designed nicely to work on a vector of inputs. The general way to do this is `cps %>% group_by(county) %>% mutate(freq = n()) %>% ungroup()`, but this particular use case is common enough that the `add_count()` convenience function does it all at once. – Gregor Thomas Nov 17 '22 at 19:55

1 Answers1

1

You can get what you want using dplyr::add_count():

library(dplyr)
mpg %>% add_count(cyl, name = "freq")
# A tibble: 234 × 12
   manufacturer model      displ  year   cyl trans      drv     cty   hwy fl    class    freq
   <chr>        <chr>      <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>   <int>
 1 audi         a4           1.8  1999     4 auto(l5)   f        18    29 p     compact    81
 2 audi         a4           1.8  1999     4 manual(m5) f        21    29 p     compact    81
 3 audi         a4           2    2008     4 manual(m6) f        20    31 p     compact    81
 4 audi         a4           2    2008     4 auto(av)   f        21    30 p     compact    81
 5 audi         a4           2.8  1999     6 auto(l5)   f        16    26 p     compact    79
 6 audi         a4           2.8  1999     6 manual(m5) f        18    26 p     compact    79
 7 audi         a4           3.1  2008     6 auto(av)   f        18    27 p     compact    79
 8 audi         a4 quattro   1.8  1999     4 manual(m5) 4        18    26 p     compact    81
 9 audi         a4 quattro   1.8  1999     4 auto(l5)   4        16    25 p     compact    81
10 audi         a4 quattro   2    2008     4 manual(m6) 4        20    28 p     compact    81
# … with 224 more rows

But if you wanted to use your function, you'd need to wrap in sapply() (or purrr:map_int()) to compare each element of x against every element:

f <- function(x) sapply(x, \(x) sum(with(mpg, cyl == x)))

You can also generalize it to work with any column:

f2 <- function(x) sapply(x, \(x_i) sum(x == x_i))

mutate(mpg, freq=f2(drv))
# A tibble: 234 × 12
   manufacturer model      displ  year   cyl trans      drv     cty   hwy fl    class    freq
   <chr>        <chr>      <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>   <int>
 1 audi         a4           1.8  1999     4 auto(l5)   f        18    29 p     compact   106
 2 audi         a4           1.8  1999     4 manual(m5) f        21    29 p     compact   106
 3 audi         a4           2    2008     4 manual(m6) f        20    31 p     compact   106
 4 audi         a4           2    2008     4 auto(av)   f        21    30 p     compact   106
 5 audi         a4           2.8  1999     6 auto(l5)   f        16    26 p     compact   106
 6 audi         a4           2.8  1999     6 manual(m5) f        18    26 p     compact   106
 7 audi         a4           3.1  2008     6 auto(av)   f        18    27 p     compact   106
 8 audi         a4 quattro   1.8  1999     4 manual(m5) 4        18    26 p     compact   103
 9 audi         a4 quattro   1.8  1999     4 auto(l5)   4        16    25 p     compact   103
10 audi         a4 quattro   2    2008     4 manual(m6) 4        20    28 p     compact   103
# … with 224 more rows
zephryl
  • 14,633
  • 3
  • 11
  • 30