1

Was just wondering if anyone knows how to work with data after its been split. Heres what I have now.

combined_cost_freq_2_inp <- run_sql("m5c_comb_out_name.sql", cond_str)

    checker <- subset(combined_cost_freq_2_inp, is.na(combined_cost_freq_2_inp$inp_allowed))

    holders <- split(combined_cost_freq_2_inp, list(combined_cost_freq_1$cd1,combined_cost_freq_1$cd2))

 if( is.na(checker$inp_allowed) == TRUE )
    {

    sub1 <- subset(holders, !is.na(inp_allowed) & svc_code_category == "Facility - Inpatient")
    sub2 <- subset(holders, is.na(inp_allowed)& svc_code_category == "Facility - Inpatient")

    sum_freq_0 <- sum(sub2$svcc_pos_freq)

    sum_freq_div <- sum_freq_0 / length(sub1$svcc_pos_freq)

    sum_freq_added <- (sub1$svcc_pos_freq + sum_freq_div)

    if( sum_freq_added > 1)
        {
            sub1$svcc_pos_freq <- 1
        }
    else    
        {
            sub1$svcc_pos_freq <- sum_freq_added
        }


holder <- rbind(sub1, sub2) 

combined_cost_freq_2_inp <- holder

The code below the split worked perfectly before the split but now that I realize I need to split on unique values this has definitely made things a lot more complicated than I would like so any help would be much appreciated!

Sample data: Note: dput(head (holders, 5)) was just to big to post

Browse[2]> str(holders)
List of 1
 $ Surgical Treatment.Laparoscopic Gallbladder Removal (Cholecystectomy):'data.frame':  1392 obs. of  26 variables:
  ..$ state            : chr [1:1392] "MO" "MO" "MO" "MO" ...
  ..$ hrrcity          : chr [1:1392] "Cape Girardeau" "Cape Girardeau" "Cape Girardeau" "Cape Girardeau" ...
  ..$ mcp_category     : chr [1:1392] "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" ...
  ..$ diagnosis_group  : chr [1:1392] "Gallstones" "Gallstones" "Gallstones" "Gallstones" ...
  ..$ cd1              : chr [1:1392] "Surgical Treatment" "Surgical Treatment" "Surgical Treatment" "Surgical Treatment" ...
  ..$ cd2              : chr [1:1392] "Laparoscopic Gallbladder Removal (Cholecystectomy)" "Laparoscopic Gallbladder Removal (Cholecystectomy)" "Laparoscopic Gallbladder Removal (Cholecystectomy)" "Laparoscopic Gallbladder Removal (Cholecystectomy)" ...
  ..$ cd3              : chr [1:1392] "Inpatient Hospital" "Inpatient Hospital" "Inpatient Hospital" "Inpatient Hospital" ...
  ..$ timeline_ind     : chr [1:1392] "Evaluation" "Evaluation" "Evaluation" "Evaluation" ...
  ..$ svc_lvl_code     : chr [1:1392] "" "Consultation and Management" "Consultation and Management" "Consultation and Management" ...
  ..$ svc_code_category: chr [1:1392] "74174" "Initial hospital care, per day (70 minutes)" "Initial observation care visit, high complexity" "Office visit, 40 minutes" ...
  ..$ svcc_pos         : chr [1:1392] "" "" "" "" ...
  ..$ claim_type       : chr [1:1392] "" "" "" "" ...
  ..$ ep_count         : int [1:1392] 14 14 14 14 14 14 14 14 14 14 ...
  ..$ svcc_freq        : num [1:1392] 0.0714 0.0714 0.0714 0.2857 0.0714 ...
  ..$ svcc_pos_freq    : num [1:1392] 0.0714 0.0714 0.0714 0.2857 0 ...
  ..$ avg_services     : num [1:1392] 1 1 1 2 1 1 1 1 1 1 ...
  ..$ pos_indicator    : chr [1:1392] NA "" "" "" ...
  ..$ average_billed   : num [1:1392] NA 389 440 266 651 ...
  ..$ average_allowed  : num [1:1392] NA 215.8 196.2 151.7 51.6 ...
  ..$ rep_code         : chr [1:1392] NA NA NA NA ...
  ..$ rx_brand_name    : chr [1:1392] NA NA NA NA ...
  ..$ rx_generic_name  : chr [1:1392] NA NA NA NA ...
  ..$ rx_avg_cost      : num [1:1392] NA NA NA NA NA NA NA NA NA NA ...
  ..$ drg_id           : chr [1:1392] NA NA NA NA ...
  ..$ inp_billed       : num [1:1392] NA NA NA NA NA NA NA NA NA NA ...
  ..$ inp_allowed      : num [1:1392] NA NA NA NA NA NA NA NA NA NA ...
nazgulian
  • 15
  • 4
  • There are idioms for working with split data: the data.table or dplyr package or anything here https://stackoverflow.com/q/3505701/ – Frank Aug 08 '17 at 18:09
  • @ChiPak I've added some data. I'm hoping that after splitting the data to subset on inp_allowed where it is both NA and not NA. In addition I also want to find when svc_code_category == "Facility - Inpatient". Does that make sense? Frank I've used lapply once but wasn't entirely sure how to apply that to a subset so its actually what I'm trying to figure out now! – nazgulian Aug 08 '17 at 18:18
  • 1
    I'm not convinced you need to `split` at all... probably `dplyr` or `data.table` is fine. The data you post starts a `list of 1` suggesting the splitting didn't do anything (it should be a list of however many values were split). – Gregor Thomas Aug 08 '17 at 18:22
  • Probably `dput(head())` was too big because of factor levels, you can drop the unshared factor levels with `dput(droplevels(head(your_data)))`. – Gregor Thomas Aug 08 '17 at 18:23
  • @Gregor I'm not super familiar with the dplyr lib... I will definitely have to look that up then! – nazgulian Aug 08 '17 at 18:25

0 Answers0