0

I have a list of data frames (sample below) where the data is about the list of hospitals across each state.

  • outcome_split is a list which has a list of data frames for each state.
  • I have added a rank column in the state AL, which ranks all the hospitals in that particular state, and similarly (using a for-loop) I would add a rank variable to all the data frames in the list.
  • I am trying to create a function whereupon giving an outcome (heart attack, heart failure etc) and rank (number) the function would return the name of a hospital and US state which matches the number (rank) entered.

As mentioned above the second element has rank variable, so I tried to call that element and match the rank specified. I am beginner and I think I am confused between '==' and '='.

 > outcome_split[[2]][, "hospital name"]["rank"==2]
    character(0)
    > outcome_split[[2]][, "hospital name"]["rank"=7]
    [1] "BIBB MEDICAL CENTER"

I want to return the name of the hospital matching the rank specified, but I am not sure how to do this. As said earlier confused about '==' and '=' because '==' returns character(0) whereas '=' returns the name of the hospital in the second element, but this return not based on the rank variable but the ID value, at place 7, the mentioned hospital is present but it is not ranked 7.

> outcome_split[[2]][, c("hospital name","rank")]
                                       hospital name rank
1                        ANDALUSIA REGIONAL HOSPITAL   52
2                          ATHENS-LIMESTONE HOSPITAL    9
3                          ATMORE COMMUNITY HOSPITAL   53
4                        BAPTIST MEDICAL CENTER EAST    2
5                       BAPTIST MEDICAL CENTER SOUTH   46
6                   BAPTIST MEDICAL CENTER-PRINCETON    8
7                                BIBB MEDICAL CENTER   54
8                       BIRMINGHAM VA MEDICAL CENTER   26
9                           BROOKWOOD MEDICAL CENTER   30
10                    BRYAN W WHITFIELD MEM HOSP INC   55

Sample data:

outcome_split <- structure(list(AK = structure(list(`hospital name` = c("PROVIDENCE ALASKA MEDICAL CENTER", 
"MAT-SU REGIONAL MEDICAL CENTER", "BARTLETT REGIONAL HOSPITAL", 
"FAIRBANKS MEMORIAL HOSPITAL", "ALASKA REGIONAL HOSPITAL", "YUKON KUSKOKWIM DELTA REG HOSPITAL", 
"CENTRAL PENINSULA GENERAL HOSPITAL", "ALASKA NATIVE MEDICAL CENTER", 
"MT EDGECUMBE HOSPITAL", "PROVIDENCE VALDEZ MEDICAL CENTER", 
"PROVIDENCE SEWARD HOSPITAL", "SITKA COMMUNITY HOSPITAL", "PROVIDENCE KODIAK ISLAND MEDICAL CTR", 
"CORDOVA COMMUNITY MEDICAL CENTER", "NORTON SOUND REGIONAL HOSPITAL", 
"PEACEHEALTH KETCHIKAN MEDICAL             CENTER", "SOUTH PENINSULA HOSPITAL"
), state = c("AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", 
"AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK"), `heart attack` = c("13.4", 
"17.7", "Not Available", "15.5", "14.5", "Not Available", "Not Available", 
"15.7", "Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available"), `heart failure` = c("12.4", "11.4", "11.6", 
"15.6", "13.4", "11.2", "11.6", "11.6", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "11.4", "10.8"), pneumonia = c("10.5", "12.1", 
"11.6", "13.4", "12.5", "9.7", "13.8", "15.5", "14.2", "Not Available", 
"Not Available", "11.5", "12.0", "Not Available", "11.6", "11.3", 
"12.2")), .Names = c("hospital name", "state", "heart attack", 
"heart failure", "pneumonia"), row.names = 99:115, class = "data.frame"), 
    AL = structure(list(`hospital name` = c("ANDALUSIA REGIONAL HOSPITAL", 
    "ATHENS-LIMESTONE HOSPITAL", "ATMORE COMMUNITY HOSPITAL", 
    "BAPTIST MEDICAL CENTER EAST", "BAPTIST MEDICAL CENTER SOUTH", 
    "BAPTIST MEDICAL CENTER-PRINCETON", "BIBB MEDICAL CENTER", 
    "BIRMINGHAM VA MEDICAL CENTER", "BROOKWOOD MEDICAL CENTER", 
    "BRYAN W WHITFIELD MEM HOSP INC", "BULLOCK COUNTY HOSPITAL", 
    "CALLAHAN EYE FOUNDATION HOSPITAL", "CHEROKEE MEDICAL CENTER", 
    "CHILTON MEDICAL CENTER", "CITIZENS BAPTIST MEDICAL CENTER", 
    "CLAY COUNTY HOSPITAL", "COMMUNITY HOSPITAL INC", "COOPER GREEN MERCY HOSPITAL", 
    "COOSA VALLEY MEDICAL CENTER", "CRENSHAW COMMUNITY HOSPITAL", 
    "CRESTWOOD MEDICAL CENTER", "CULLMAN REGIONAL MEDICAL CENTER", 
    "D C H REGIONAL MEDICAL CENTER", "D W MCMILLAN MEMORIAL HOSPITAL", 
    "DALE MEDICAL CENTER", "DECATUR GENERAL HOSPITAL", "DEKALB REGIONAL MEDICAL CENTER", 
    "EAST ALABAMA MEDICAL CENTER AND SNF", "ELBA GENERAL HOSPITAL", 
    "ELIZA COFFEE MEMORIAL HOSPITAL", "ELMORE COMMUNITY HOSPITAL", 
    "EVERGREEN MEDICAL CENTER", "FAYETTE MEDICAL CENTER", "FLORALA MEMORIAL HOSPITAL", 
    "FLOWERS HOSPITAL", "GADSDEN REGIONAL MEDICAL CENTER", "GEORGE H. LANIER MEMORIAL HOSPITAL", 
    "GEORGIANA HOSPITAL", "GREENE COUNTY HOSPITAL", "GROVE HILL MEMORIAL HOSPITAL", 
    "HALE COUNTY HOSPITAL", "HELEN KELLER MEMORIAL HOSPITAL", 
    "HIGHLANDS MEDICAL CENTER", "HILL HOSPITAL OF SUMTER COUNTY", 
    "HUNTSVILLE HOSPITAL", "INFIRMARY WEST", "J PAUL JONES HOSPITAL", 
    "JACK HUGHSTON MEMORIAL HOSPITAL", "JACKSON HOSPITAL & CLINIC INC", 
    "JACKSON MEDICAL CENTER", "JACKSONVILLE MEDICAL CENTER", 
    "L V STABLER MEMORIAL HOSPITAL", "LAKE MARTIN COMMUNITY HOSPITAL", 
    "LAKELAND COMMUNITY HOSPITAL", "LAWRENCE MEDICAL CENTER", 
    "MARION REGIONAL MEDICAL CENTER", "MARSHALL MEDICAL CENTER NORTH", 
    "MARSHALL MEDICAL CENTER SOUTH", "MEDICAL CENTER BARBOUR", 
    "MEDICAL CENTER ENTERPRISE", "MEDICAL WEST, AN AFFILIATE OF UAB HEALTH SYSTEM", 
    "MIZELL MEMORIAL HOSPITAL", "MOBILE INFIRMARY", "MONROE COUNTY HOSPITAL", 
    "NORTH BALDWIN INFIRMARY", "NORTHEAST ALABAMA REGIONAL MED CENTER", 
    "NORTHWEST MEDICAL CENTER", "PARKWAY MEDICAL CENTER", "PICKENS COUNTY MEDICAL CENTER", 
    "PRATTVILLE BAPTIST HOSPITAL", "PROVIDENCE HOSPITAL", "RED BAY HOSPITAL", 
    "RIVERVIEW REGIONAL MEDICAL CENTER", "RUSSELL HOSPITAL", 
    "RUSSELLVILLE HOSPITAL", "SHELBY BAPTIST MEDICAL CENTER", 
    "SHOALS HOSPITAL", "SOUTH BALDWIN REGIONAL MEDICAL CENTER", 
    "SOUTHEAST ALABAMA MEDICAL CENTER", "SPRINGHILL MEDICAL CENTER", 
    "ST VINCENT'S BIRMINGHAM", "ST VINCENT'S EAST", "ST VINCENT'S ST CLAIR", 
    "ST VINCENTS BLOUNT", "STRINGFELLOW MEMORIAL HOSPITAL", "THOMAS HOSPITAL", 
    "TRINITY MEDICAL CENTER", "TROY REGIONAL MEDICAL CENTER", 
    "TUSCALOOSA VA MEDICAL CENTER", "UNIV OF S AL CHILDREN'S & WOMEN'S HOS", 
    "UNIV OF SOUTH ALABAMA MEDICAL CENTER", "UNIVERSITY OF ALABAMA HOSPITAL", 
    "VA CENTRAL ALABAMA HEALTHCARE SYSTEM - MONTGOMERY", "VAUGHAN REG MED CENTER PARKWAY CAMPUS", 
    "WALKER BAPTIST MEDICAL CENTER", "WASHINGTON COUNTY HOSPITAL", 
    "WEDOWEE HOSPITAL", "WIREGRASS MEDICAL CENTER"), state = c("AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL"), `heart attack` = c("Not Available", 
    "15.0", "Not Available", "14.2", "17.8", "14.9", "Not Available", 
    "16.1", "16.5", "Not Available", "Not Available", "Not Available", 
    "Not Available", "Not Available", "17.3", "16.7", "17.1", 
    "Not Available", "15.2", "Not Available", "13.3", "17.1", 
    "15.8", "15.7", "17.3", "16.8", "18.0", "16.3", "Not Available", 
    "18.1", "Not Available", "Not Available", "16.7", "Not Available", 
    "15.2", "16.7", "15.4", "14.5", "Not Available", "Not Available", 
    "Not Available", "19.6", "15.0", "Not Available", "15.2", 
    "Not Available", "Not Available", "Not Available", "17.5", 
    "Not Available", "Not Available", "Not Available", "Not Available", 
    "Not Available", "15.6", "Not Available", "Not Available", 
    "18.5", "Not Available", "16.6", "15.3", "Not Available", 
    "19.3", "Not Available", "Not Available", "15.6", "Not Available", 
    "15.8", "Not Available", "14.6", "15.2", "Not Available", 
    "16.9", "17.1", "Not Available", "15.9", "Not Available", 
    "15.8", "14.3", "16.0", "16.2", "17.7", "Not Available", 
    "Not Available", "16.4", "14.7", "16.8", "Not Available", 
    "Not Available", "Not Available", "Not Available", "15.0", 
    "Not Available", "14.7", "17.0", "Not Available", "Not Available", 
    "Not Available"), `heart failure` = c("10.1", "11.7", "10.8", 
    "9.6", "11.8", "11.4", "14.0", "10.4", "13.5", "11.7", "12.3", 
    "Not Available", "12.1", "11.5", "14.9", "12.6", "12.3", 
    "Not Available", "11.7", "13.8", "13.8", "12.1", "11.2", 
    "14.8", "11.8", "10.9", "16.6", "12.9", "Not Available", 
    "11.3", "11.3", "9.1", "11.7", "10.4", "12.0", "10.7", "8.8", 
    "10.8", "11.2", "10.4", "10.7", "12.6", "13.4", "Not Available", 
    "12.4", "12.5", "Not Available", "10.8", "10.2", "12.3", 
    "16.4", "11.1", "10.9", "13.6", "9.9", "11.5", "12.5", "15.2", 
    "13.5", "12.9", "11.4", "13.6", "10.7", "13.0", "11.5", "11.2", 
    "11.8", "10.5", "12.6", "14.8", "13.5", "12.6", "10.8", "11.6", 
    "14.8", "13.6", "13.6", "15.1", "11.4", "10.4", "10.6", "10.9", 
    "10.8", "13.0", "12.0", "12.8", "12.9", "11.2", "Not Available", 
    "Not Available", "12.5", "12.5", "12.2", "12.0", "10.8", 
    "Not Available", "10.4", "10.6"), pneumonia = c("11.1", "12.1", 
    "13.0", "10.2", "14.3", "11.6", "13.6", "11.0", "13.0", "9.1", 
    "12.1", "Not Available", "14.7", "11.2", "12.1", "11.8", 
    "11.6", "Not Available", "11.4", "15.8", "10.4", "12.1", 
    "11.3", "12.6", "9.9", "11.9", "15.8", "12.1", "12.0", "13.4", 
    "11.2", "12.0", "12.9", "12.1", "11.3", "14.6", "10.3", "11.3", 
    "11.5", "12.1", "11.5", "15.0", "12.9", "Not Available", 
    "14.1", "13.1", "11.4", "10.9", "14.7", "9.3", "19.2", "13.0", 
    "10.8", "10.7", "9.8", "10.0", "8.7", "13.9", "15.0", "12.9", 
    "12.1", "14.9", "12.5", "15.6", "14.6", "13.2", "13.1", "11.9", 
    "12.4", "14.2", "10.6", "11.6", "12.7", "14.9", "11.5", "10.7", 
    "12.8", "9.8", "10.9", "13.8", "12.6", "16.2", "11.4", "15.3", 
    "12.0", "13.1", "13.9", "11.1", "Not Available", "Not Available", 
    "Not Available", "12.7", "11.3", "14.0", "11.9", "Not Available", 
    "13.9", "12.3"), rank = c(52L, 9L, 53L, 2L, 46L, 8L, 54L, 
    26L, 30L, 55L, 56L, 57L, 58L, 59L, 42L, 32L, 39L, 60L, 12L, 
    61L, 1L, 40L, 21L, 20L, 43L, 35L, 47L, 28L, 62L, 48L, 63L, 
    64L, 33L, 65L, 13L, 34L, 17L, 4L, 66L, 67L, 68L, 51L, 10L, 
    69L, 14L, 70L, 71L, 72L, 44L, 73L, 74L, 75L, 76L, 77L, 18L, 
    78L, 79L, 49L, 80L, 31L, 16L, 81L, 50L, 82L, 83L, 19L, 84L, 
    22L, 85L, 5L, 15L, 86L, 37L, 41L, 87L, 24L, 88L, 23L, 3L, 
    25L, 27L, 45L, 89L, 90L, 29L, 6L, 36L, 91L, 92L, 93L, 94L, 
    11L, 95L, 7L, 38L, 96L, 97L, 98L)), class = "data.frame", .Names = c("hospital name", 
    "state", "heart attack", "heart failure", "pneumonia", "rank"
    ), row.names = c(NA, -98L))), .Names = c("AK", "AL"))
smci
  • 32,567
  • 20
  • 113
  • 146
cyborg
  • 431
  • 1
  • 6
  • 20
  • Something is wrong with your sample data, I can't read it with `dget`. The parentheses don't match. Is your `dput` complete? – jsta Mar 16 '18 at 01:09
  • @jsta I pasted it again, could you please check. – cyborg Mar 16 '18 at 01:13
  • 1
    `outcome_split[[2]]$`\``hospital name`\``[outcome_split[[2]]$rank == 2]` – Ronak Shah Mar 16 '18 at 01:47
  • 1
    You can eliminate the need for a function by dplyr's `arrange(rank)` which gives you a df sorted by that column. – smci Mar 16 '18 at 02:23
  • 1
    And you can collapse the list of dfs into one large df. `output_split[[1]]$rank <- NA ; do.call(function(...) rbind(..., make.row.names=F), output_split)` does that. Now your dplyr filter is simply `%>% filter(state=='AL', rank==2) %>% select('hospital name')` – smci Mar 16 '18 at 02:37
  • (Beware that `rank` will no longer be unique (across states), so now you want to select by state,rank. dplyr and data.table both have a concept of a multi-index.) – smci Mar 16 '18 at 02:39
  • Related: [Convert a list of data frames into one data frame](https://stackoverflow.com/questions/2851327/convert-a-list-of-data-frames-into-one-data-frame) – smci Mar 16 '18 at 02:44
  • @smci this is great stuff. thanks – cyborg Mar 16 '18 at 02:51

2 Answers2

1

If you want to select rank 2 and 7 from your second list element try:

outcome_split[[2]][outcome_split[[2]]$rank == 2, c("hospital name", "rank")]

hospital name rank

4 BAPTIST MEDICAL CENTER EAST 2

outcome_split[[2]][outcome_split[[2]]$rank == 7, c("hospital name", "rank")]

hospital name rank

94 VAUGHAN REG MED CENTER PARKWAY CAMPUS 7

I recommend collapsing your list to a data.frame as this will make filtering much easier. Try searching for dplyr::bind_rows or do.call("rbind")

jsta
  • 3,216
  • 25
  • 35
  • 1
    That’s actually what I had in mind regarding collapsing to data frame and thanks for reply I was struggling on how to approach this. – cyborg Mar 16 '18 at 01:52
  • Yes, you should collapse the list of dfs into one large df. `output_split[[1]]$rank <- NA ; do.call(function(...) rbind(..., make.row.names=F), output_split)` does that. – smci Mar 16 '18 at 02:36
1

Your rank column is not in order, see below where I arrange by rank.

The select'ing is a one-liner with dplyr (or with data.table):

require(dplyr)

output_split[[2]] %>% filter(rank == 2) %>% select('hospital name')

                hospital name
1 BAPTIST MEDICAL CENTER EAST

output_split[[2]] %>% filter(rank == '7') %>% select('hospital name')
                      hospital name
1 VAUGHAN REG MED CENTER PARKWAY CAMPUS

# Here's the hospital order when we arrange by 'rank':
output_split[[2]] %>% arrange(rank) %>% select('hospital name', 'rank') %>% head(7)
                          hospital name rank
1              CRESTWOOD MEDICAL CENTER    1
2           BAPTIST MEDICAL CENTER EAST    2
3      SOUTHEAST ALABAMA MEDICAL CENTER    3
4                    GEORGIANA HOSPITAL    4
5           PRATTVILLE BAPTIST HOSPITAL    5
6                       THOMAS HOSPITAL    6
7 VAUGHAN REG MED CENTER PARKWAY CAMPUS    7

# ... and here was your original order
output_split[[2]] %>% select('hospital name', 'rank') %>% head(7)
                     hospital name rank
1      ANDALUSIA REGIONAL HOSPITAL   52
2        ATHENS-LIMESTONE HOSPITAL    9
3        ATMORE COMMUNITY HOSPITAL   53
4      BAPTIST MEDICAL CENTER EAST    2
5     BAPTIST MEDICAL CENTER SOUTH   46
6 BAPTIST MEDICAL CENTER-PRINCETON    8
7              BIBB MEDICAL CENTER   54

By the way, to avoid trouble, use underscores instead of spaces inside column names, then we don't need quotes around 'hospital_name' etc.

names(os[[2]]) <- gsub(' ', '_', names(os[[2]]))) renames them "hospital_name" "state" "heart_attack" "heart_failure" "pneumonia" "rank"

Or you can use make.names() which will mangle any characters other than alphanumeric, underscore and dot. And gsub() if you want finer control.

And you can collapse the list of dfs into one large df:

output_split[[1]]$rank <- NA
do.call(function(...) rbind(..., make.row.names=F), output_split)

does that. Now your dplyr select is simply %>% filter(state=='AL', rank==2) %>% select('hospital name')

smci
  • 32,567
  • 20
  • 113
  • 146
  • Regarding the ‘_’ yeah, that’s true but this code is part of a function where user can enter outcome such as heart attack or heart failure and the rank (number) they want to view. The function would return the hospital name for each city for the rank specified. So, to sum up I feel it’s difficult for a user to enter heart_attack but I would really happy if you have anything in mind for this. – cyborg Mar 16 '18 at 01:49
  • Yes, [make.names()](https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html) will mangle any characters other than alphanumeric, underscore and dot. Or [gsub()](https://stat.ethz.ch/R-manual/R-devel/library/base/html/grep.html) if you want finer control. – smci Mar 16 '18 at 01:54
  • 1
    `names(os[[2]]) <- gsub(' ', '_', names(os[[2]])))` renames them `"hospital_name" "state" "heart_attack" "heart_failure" "pneumonia" "rank"` – smci Mar 16 '18 at 02:00
  • Sure thing. You can eliminate the need for a function by dplyr's `arrange(rank)` which gives you a df sorted by that column. (Note that the rank column then disappears, since it's now row-indices). – smci Mar 16 '18 at 02:24
  • thanks, I learned so much in this one post. Will apply all that now. – cyborg Mar 16 '18 at 02:50
  • No problem. `data.table` is another awesome package for dataframes, similar to `dplyr` – smci Mar 16 '18 at 06:13