1

Background

I have two df's in r, one called d and the other called insurance. d contains a list of unique ID along with two attributes, gender and zip (which are irrelevant to this question). insurance contains some of those same ID numbers, but they aren't unique. They represent people's health insurance plans in a given year. Some ID in the insurance table repeat, because that person has had more than one insurance plan in that year (i.e. they changed insurers). Here's some code to make these tables:

d <- data.frame(ID = c("a","b","c","d","e","f"), 
                    gender = c("f","f","m","f","m","m"), 
                    zip = c(48601,60107,29910,54220,28173,44663),stringsAsFactors=FALSE)

Which looks like:

ID gender zip
a f 48601
b f 60107
c m 29910
d f 54220
e m 28173
f m 44663
insurance <- data.frame(ID = c("a","a","c","d","f","f"), 
                     ins_type = c("public","private","private","private","private","public"), 
                     insurer = c("medicare","aetna","cigna","uhc","uhc","medicaid"),
                     stringsAsFactors = FALSE)

Which looks like:

ID ins_type insurer
a public medicare
a private aetna
c private cigna
d private uhc
f private uhc
f public medicaid

Here's my goal:

I need d to reflect whether anyone in d$ID has had any public insurance, and if so, which insurer. Specifically, this means "looking up" to the insurance table and making 2 new variables in d: first, a 1/0 or yes/no variable for ever having public insurance (call this variable d$public); second, which insurer it is (call this d$insurer).

The tricky bit is that I need d$ID to remain unique, because it has to be a primary key for another aspect of the project I haven't outlined here. So the idea is if any ID has any entry for public insurance, then d$public should get a "1" or a "yes" or whatever.

Desired result

I'd like a table that looks like this:

ID gender zip public insurer
a f 48601 1 medicare
b f 60107 0 NA
c m 29910 0 NA
d f 54220 0 NA
e m 28173 0 NA
f m 44663 1 medicaid

What I've tried

Versions of this question have been asked before (e.g. here, here) but I can't quite get this right.

I've tried using joins, like so:

d2 <- d %>%
  left_join(insurance, by=c("ID"="ID"))

This gets me the columns I want, but it makes IDs like a repeat, which I can't have.

Thanks for any help!

logjammin
  • 1,121
  • 6
  • 21

2 Answers2

1

d %>% 
      left_join(insurance %>% filter(ins_type == "public"), by = "ID") %>%
      mutate(public = ! is.na(ins_type)) %>%
      select(-ins_type)
dash2
  • 2,024
  • 6
  • 15
1

This is a solution using base R. No need to install any tidyverse related packages.

# Define public column based on insurance$ins_type
df$public <- ifelse(insurance$ins_type == "public", 1, 0)

# Now we'll define df$insurer as the result of applying a function
# that receives the ids and df$public as arguments.

# if the public type is not 1, return NA, else return the insurer name
# in the insurance df where type is 1 and id = id

df$insurer <- mapply(
    function(id, type) {
    if(type != 1)
        return(NA)
    return(insurance$insurer[insurance$id == id & insurance$type == type])
    }
    df$id,
    df$public
)
eduardokapp
  • 1,612
  • 1
  • 6
  • 28
  • In that first definition of the 'public' column, what if, in a different scenario, I wanted to refer to ANY entry in that column -- not just "public"? – logjammin Apr 27 '21 at 19:14
  • 1
    If you wanted to refere to ANY SPECIFIC entry, then you'd just have to change "public" for "whatever you want". Now, if you want anything at all, then just go on and set `df$public <- 1`, since you don't care what entry it is. If you want ANY entry, but not NA, then `df$public <- ifelse(!is.na(insurance$ins_type), 1, 0)`. Now for the last possible situation maybe you have more than 2 possibilities (public, private, mixed, for example). – eduardokapp Apr 27 '21 at 19:17
  • 1
    In that situation, a nice readable way would be to define the entries you're interested in this mixed example, like: `entries <- c("public", "mixed")` and then the original code would be something like `df$public <- ifelse(insurance$ins_type %in% entries, 1, 0)`. Hope these ideas help you out! – eduardokapp Apr 27 '21 at 19:18
  • Oh yeah I meant "any entry, even NA, in insurance$ins_type". So, like, df$public wouldn't really be public anymore, it's just be df$ins_type. So I guess it would be `df$public <- ifelse(insurance$ins_type, 1, 0)` – logjammin Apr 27 '21 at 19:25
  • No that would be wrong. ifelse works by testing the first argument as a logical condition (it will be either true or false). So the way you wrote it, it does not make sense, as it is not a logical condition. Again, if you don't care about what entry it is, then simply set all "ins_type" to 1. Or if you want to keep the categoric variable, maybe set it as a factor. Please read the introductory sections of Advanced R, by Hadley Wickham. It will you help you out! – eduardokapp Apr 27 '21 at 20:06