4

My data looks like this:

hh_id indl ind_salary hh_income
1 1 200
1 2 450
1 3 00
2 4 1232
2 5 423

Individuals with the same hh_id lives in the same household so they will have the same household income. And for that the variable hh_income equal the sum of the salary of all persons with the same hh_id;

so my data would look like:

hh_id indl ind_salary hh_income
1 1 200 650
1 2 450 650
1 3 00 650
2 4 1232 1655
2 5 423 1655

Any ideas please;

bassam1243
  • 43
  • 3

4 Answers4

5

Using dplyr:

data %>% group_by(hh_id) %>% mutate(hh_income = sum(ind_salary))
KacZdr
  • 1,267
  • 3
  • 8
  • 23
3

You can use R base function ave to generate sum of ind_salary grouped by hh_id and get a vector of the same length of ind_salary

> df$hh_income <- ave(df$ind_salary, df$hh_id, FUN=sum)
> df
  hh_id indl ind_salary hh_income
1     1    1        200       650
2     1    2        450       650
3     1    3          0       650
4     2    4       1232      1655
5     2    5        423      1655
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
1

Using only base R:

hh_id <- c(1, 1 ,1, 2, 2)
indl <- c(1, 2, 3, 4, 5)
ind_salary <- c(200, 450, 0, 1232, 423)

hh_df <- data.frame(hh_id, indl, ind_salary)

hh_income <- tapply(hh_df$ind_salary, hh_df$hh_id, sum)
hh_income <- as.data.frame(hh_income)
hh_income$hh_id <- rownames(hh_income)
hh_df <- merge(hh_df, hh_income, by = 'hh_id')
View(hh_df)
br00t
  • 1,440
  • 8
  • 10
0

Just to add more explanation to KacZdr's answer which would have helped me immensely as a beginner. Also, this is more in line with standard tidyr pipe code standards.

new_data <- data %>% # This creates a new dataset from the original so you don't alter the original, I find this much easier
group_by(hh_id)%>% # obviously groups the data by the variable that has duplicate values within the column that you want to apply a summary function , in this case sum
mutate(income = sum(ind_salary))# mutate creates a new column "income" and fills it with the sum of ind_salary for all with the same hh_id. This would be what you have called hh_income in your table.
  • 1
    Very good interpretation and explanation, it is also worth mentioning that R offers us documentation for individual functions, just type a question mark before its name, e.g. `?mutate`. – KacZdr Oct 14 '22 at 16:10