0

I have been attempting to identify the best hospitals in each American state using public domain data that lists the mortality rates within each hospital for 3 individual ailments - pneumonia, heart failure and heart attack.

The data frame that I am using to investigate this holds five columns relevant to this; Hospital 30 day mortality rate for each ailment, the state in which each hospital can be found, and the name of each hospital.

I am writing a function that will take two arguments, being the state, and the ailment. I have taken the code out of the function for clarity, and will therefore only focus on the pneumonia ailment, in state "AL", which I believe is Alabama.

My issue is that now I have the information required i.e the lowest 30 day mortality rate in the state of Alabama with regards to pneumonia, however, I would also like to provide the name of the individual hospital.

I don't know how to do this, and in spite of spending time looking for resources and advice online before coming here, I was unsuccessful. If anyone can provide directions to resources that could help me to get a better grasp of this I would be willing to take a look.

Here is the process my function performs - bear in mind that in the actual function you can specify the ailment and state in the arguments of the function, and pneumonia and AL are not coded into the function itself.

mortality_rates <- data$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia

sorted <- tapply(mortality_rates, data$State, sort)

filtered <- x$AL

best_in_state <- min(filtered)

As you can see, it specifies the ailment (pneumonia), organises it by state and ascending numerical order, and then specifies the state of Alabama. It finally extracts the lowest mortality rate, providing me with the actual best result. How would I identify the hospital that is associated with this mortality rate?

also, if the existing code's quality is poor and over engineered, please let me know how it could be improved - It would be really helpful.

Phil
  • 7,287
  • 3
  • 36
  • 66
  • If you're not opposed to `dplyr`, then `data |> group_by(State) |> slice_min(Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia)` will get you the row with the lowest mortality for each state. – Gregor Thomas Aug 02 '23 at 14:32
  • If you'd prefer to stay in base R, the accepted answer at the marked duplicate, modified for your data/column names, is `data[ data$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia == ave(data$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia, data$State, FUN = min), ]` – Gregor Thomas Aug 02 '23 at 14:39
  • 1
    In the future, I'd strongly recommend including a small sample of input data in your question. In cases where we need to code/debug, it's useful to have something to test on. You can use `dput()` to make copy/pasteable R code to reproduce data, e.g., `dput(your_data[1:5, ])` will give code to create the first 5 rows of `your_data`, including all class and structure info. – Gregor Thomas Aug 02 '23 at 14:40
  • As far as your code goes, it is "over-engineered" in that the `sort` is pointless. `min` doesn't require its input to be sorted. You can get all the mins with `lowest_rates = tapply(data$Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia, data$State, min)`, and then, e.g., `lowest_rates["AL"]` for Alabama. But these days there's very little reason to use obtuse base functions like `tapply` instead of `dplyr` or `data.table`. – Gregor Thomas Aug 02 '23 at 14:44

0 Answers0