0

I want to add a column to my dataframe that displays frequency sums based on age group so I can calculate percentages as an additional column afterward. Right now I have two dataframes, the one I want to work with

Residential Status age_group frequency
1 50-59 5327
1 60-69 1962
1 70-79 224
1 80-85 16
2 50-59 1260
2 60-69 1176
2 70-79 428
2 80-85 75
...

and the one that has the aggregate values.

age_group group total
50-59 117812
60-69 71868
70-79 18796
80-85 6310

I want it to look like this:

Residential Status age_group frequency group total
1 50-59 5327 117812
1 60-69 1962 71868
1 70-79 224 18796
1 80-85 16 6310
2 50-59 1260 117812
2 60-69 1176 71868
2 70-79 428 18796
2 80-85 75 6310

I have tried using merge(), but it's literally adding the second dataframe on top of the first. I also tried to use summarise(), but that didn't work either. Any ideas?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
AB R.
  • 1
  • 1
  • `merge(your_first_data_frame, your_second_data_frame, by = "age_group")` should work just fine. Or with `dplyr` `first_data_frame |> left_join(second_data_frame)`. It would be nice to see what code you tried that didn't work. Perhaps your `age_group` columns are different classes in the different data frames? Maybe `factor` in one and `character` in another? Or `factor` classes with different levels? If you convert them both to `character` class with `as.character()` that would solve that problem. – Gregor Thomas Jul 05 '23 at 17:48
  • If you still have trouble after that, inspect the values closely in the `age_group` column to make sure you don't have unneeded white space like `"80-85 "`, or other irregularities. If you still have the issue, please edit your question to share the sample data with `dput()`, e.g., `dput(your_first_data_frame[1:10, ])` for the first 10 rows of one data frame. That will include all data structure information so we can inspect more closely. – Gregor Thomas Jul 05 '23 at 17:49

0 Answers0