I was reading how to use Stata's egen function equivalent in R.
G. Grothendieck answered 3 ways to do so here (dplyr,ave, and data.table), where I prefer the first method to solve my problem as I want to create a new variable and add it to my data frame clean_R.
My dataset:
> head(clean_R)
# A tibble: 6 × 2
`clean_R$occ2010` `clean_R$marbaseci…
<int+lbl> <dbl>
1 4700 [First-Line Supervisors of Sales Workers] 1116000005
2 30 [Managers in Marketing, Advertising, and… 1116000007
3 430 [Managers, nec (including Postmasters)] 1116000008
4 4030 [Food Preparation Workers] 1116000010
5 4600 [Childcare Workers] 1116000011
6 5700 [Secretaries and Administrative Assistan… 1116000013
Dput Output:
structure(list(`clean_R$occ2010` = structure(c(4700L, 30L, 430L,
4030L, 4600L, 5700L), labels = c(`Chief executives and legislators/public administration` = 10L, `Military Enlisted Tactical Operations and Air/Weapons Specialists and Crew Members` = 9820L, `Military, Rank Not Specified` = 9830L, NIU = 9999L), label = "Occupation, 2010 basis", var_desc = "OCC2010 is a harmonized occupation coding scheme based on the Census Bureau's 2010 occupation classification scheme. Similar variables are offered for the 1950 (OCC1950) and 1990 (OCC1990) classification codes. OCC2010 offers researchers a consistent, long-term classification of occupations. \n\nThe occupational coding scheme in CPS data has changed several times since the 1960s. \n\nIn the interest of harmonization, however, the scheme has been modified to achieve the most consistent categories across time. That is, some categories that provide more detail in the 2010 scheme were grouped together because earlier categories are inseparable when more than one occupation is coded together. For users who wish to further aggregate occupation to broader categories, the 2010 scheme is generally organized by the following groups:\n\nManagement in Business, Science, and Arts = 10-430\nBusiness Operations Specialists = 500-730\nTransportation and Material Moving = 9000-9750\nMilitary = 9800-9830\nNo Occupation = 9920\n\nWe followed a process of constructing and testing OCC2010 that is similar to OCC1990's process, which is discussed in more detail in this BLS working paper. We performed a variety of tests to ensure that the new categories are as robust as possible over the long-term. Please also see the description tab for OCC1990 for further detail about our process.", class = c("haven_labelled",
"vctrs_vctr", "integer")), `clean_R$marbasecidp` = c(1116000005,
1116000007, 1116000008, 1116000010, 1116000011, 1116000013)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
My code:
clean_R %>%
group_by(occ2010) %>%
mutate(count(occ2010)) -> clean_R$N_occ2010
I get the following error by running the code above, presumably because I have labels (and thus some clash?), or the values of variable occ2010 are not ???
Error: Problem with `mutate()` input `..1`.
ℹ `..1 = count(occ2010)`.
x no applicable method for 'count' applied to an object of class "c('haven_labelled', 'vctrs_vctr', 'integer')"
ℹ The error occurred in group 1: occ2010 = 10.
I tried using nrow function but it gives the same error.
I have also tried the within command, which gives me the error below.
within(clean_R, {N_occ2010 = ave(occ2010,clean_R$marbasecidp, FUN = count)} )
Error in UseMethod("count") :
no applicable method for 'count' applied to an object of class "c('haven_labelled', 'vctrs_vctr', 'integer')"
Not sure what I am missing. Any suggestions ?