Background
I have two df's in r, one called d
and the other called insurance
. d
contains a list of unique ID
along with two attributes, gender
and zip
(which are irrelevant to this question). insurance
contains some of those same ID
numbers, but they aren't unique. They represent people's health insurance plans in a given year. Some ID
in the insurance
table repeat, because that person has had more than one insurance plan in that year (i.e. they changed insurers). Here's some code to make these tables:
d <- data.frame(ID = c("a","b","c","d","e","f"),
gender = c("f","f","m","f","m","m"),
zip = c(48601,60107,29910,54220,28173,44663),stringsAsFactors=FALSE)
Which looks like:
ID | gender | zip |
---|---|---|
a | f | 48601 |
b | f | 60107 |
c | m | 29910 |
d | f | 54220 |
e | m | 28173 |
f | m | 44663 |
insurance <- data.frame(ID = c("a","a","c","d","f","f"),
ins_type = c("public","private","private","private","private","public"),
insurer = c("medicare","aetna","cigna","uhc","uhc","medicaid"),
stringsAsFactors = FALSE)
Which looks like:
ID | ins_type | insurer |
---|---|---|
a | public | medicare |
a | private | aetna |
c | private | cigna |
d | private | uhc |
f | private | uhc |
f | public | medicaid |
Here's my goal:
I need d
to reflect whether anyone in d$ID
has had any public insurance, and if so, which insurer. Specifically, this means "looking up" to the insurance
table and making 2 new variables in d
: first, a 1/0 or yes/no variable for ever having public insurance (call this variable d$public
); second, which insurer it is (call this d$insurer
).
The tricky bit is that I need d$ID
to remain unique, because it has to be a primary key for another aspect of the project I haven't outlined here. So the idea is if any ID
has any entry for public insurance, then d$public
should get a "1" or a "yes" or whatever.
Desired result
I'd like a table that looks like this:
ID | gender | zip | public | insurer |
---|---|---|---|---|
a | f | 48601 | 1 | medicare |
b | f | 60107 | 0 | NA |
c | m | 29910 | 0 | NA |
d | f | 54220 | 0 | NA |
e | m | 28173 | 0 | NA |
f | m | 44663 | 1 | medicaid |
What I've tried
Versions of this question have been asked before (e.g. here, here) but I can't quite get this right.
I've tried using joins, like so:
d2 <- d %>%
left_join(insurance, by=c("ID"="ID"))
This gets me the columns I want, but it makes IDs like a
repeat, which I can't have.
Thanks for any help!