1

I would like to sum up costs. However, my data is a little complicated (first time R user). I have data of 6 years (2013-2018), and each ID has GP costs. This means that there are multiple rows per year for each individualr. I would like to sum up the costs per individual per year. However, the costs can come from different categories, and if they are, I only want them summed up if they are from the same categories. For example: I want all the costs for ID 1 in 2013 together if they are other. And then a new row for ID 1 in 2013 for mental (see below).

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   3785547 obs. of  4 variables:
 $ ID: 1, 1, 1, 2, 2..
  ..- attr(*, "format.spss")= chr "F9.3"
 $ Category: 'haven_labelled' chr  "Other” “Mental” "Other” "Other”  ...
  ..- attr(*, "format.spss")= chr "A66"
  ..- attr(*, "display_width")= int 50
  ..- attr(*, "labels")= Named chr  "Long" "Short" "Middle" "After" ...
  .. ..- attr(*, "names")= chr  "Long" "Short" "Middle" "After" ...
 $ Year        : num  2013 2013 2014 2014 2015 ...
  ..- attr(*, "format.spss")= chr "F9.3"
 $ Costs           : num  124 76.6 44.3 33.7 24.7 ...
  ..- attr(*, "format.spss")= chr "F9.3"

Overview: - ID: 1, 1, 1, 1, 1, 1, 1, 1, 2, 2…. - Year: 2013, 2013, 2014, 2015, 2015, 2015, 2017, 2013, 2014.. - Category: other, mental, other, other, other, other, mental, special, other…
- Costs: 20, 21, 30, 50, 40, 44, 20, 50, 35…

What I want: Each individual has one row per year (for each cost categories) with the summed up costs from that specific year and cost categorie.

I tried: sum_col_if(criterion, ..., data = NULL), but couldn't make it work.

Thank you very much!

  • Please add a reproducible example. You'll need to use dput so that we can help you with your data. – Sergio Romero May 05 '20 at 09:11
  • I'm so sorry, I'm both new to R and StackOverflow. How can I do that? I thought this was clear.. – Student0172 May 05 '20 at 09:21
  • You can check this post https://stackoverflow.com/questions/49994249/example-of-using-dput . This allows people who want to help you to actually work with the data so that they can give you an answer. – Sergio Romero May 05 '20 at 09:22
  • Hi Sergio, thank you for the link and your help. I will keep it in mind when I have any other questions. – Student0172 May 05 '20 at 09:29

1 Answers1

1

welcome Student! the tidyverse was designed to make this very simple... you can do the following, assuming your dataframe is called df:

df %>% group_by(ID, Category, Year) %>% summarize(total = sum(Costs))

This way you're creating groups of ID/Category/Year, and summing them up. Give it a try!

Amit Kohli
  • 2,860
  • 2
  • 24
  • 44