0

I have ~4000 observations in my data frame, test_11, and have pasted part of the data frame below:

data frame snippit

The k_hidp column represents matching households, the k_fihhmnnet1_dv column is their reported household income and the percentage_income_rounded reports each participant's income contribution to the total household income

I want to filter my data to remove all k_hidp observations where their collective income in the percentage_income_rounded does not equal 100.

So for example, the first household 68632420 reported a contribution of 83% (65+13) instead of the 100% as the other households report.

Is there any way to remove these household observations so I am only left with households with a collective income of 100%?

Thank you!

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Maryam
  • 1
  • 1
    You could group by `k_hidp`, create a new column with the sum of percentages for that group, and then filter those with a value different from 100 in that column. Please post a sample of your data as code (for example, with `dput(head(test_11))` because we can't copy and paste that screenshot into R to help you. – Andrea M Jul 01 '22 at 13:51
  • 1
    It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please do not post images of data or code. – MrFlick Jul 01 '22 at 13:55
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 01 '22 at 18:42

1 Answers1

1

Try this:

## Creating the dataframe
df=data.frame(k_hidp = c(68632420,68632420,68632420,68632420,68632420,68632420,68632422,68632422,68632422,68632422,68632428,68632428),
              percentage_income_rounded = c(65,18,86,14,49,51,25,25,25,25,50,50))

## Loading the libraries
library(dplyr)

## Aggregating and determining which household collective income is 100%
df1 = df %>%
  group_by(k_hidp) %>%
  mutate(TotalPercentage = sum(percentage_income_rounded)) %>%
  filter(TotalPercentage == 100)

Output

> df1
# A tibble: 6 x 3
# Groups:   k_hidp [2]
    k_hidp percentage_income_rounded TotalPercentage
     <dbl>                     <dbl>           <dbl>
1 68632422                        25             100
2 68632422                        25             100
3 68632422                        25             100
4 68632422                        25             100
5 68632428                        50             100
6 68632428                        50             100
Deepansh Arora
  • 724
  • 1
  • 3
  • 15