0

I was wondering if it was possible to use the split function to organize things by 2 variables instead of just 1?

Here is the code right now.

holders <- split(z_combined_cost_dtrmnt, z_combined_cost_dtrmnt$val_lvl2 )
holders <- lapply(holders, function(x) x[!x$episode_count <= 3 | is.na(x$episode_count),])
holders <- lapply(holders, function(x){
                    x$prd_num_of_days_num <- remove_outliers(x$prd_num_of_days_num)
                    return(x) })

z_combined_cost_dtrmnt <- do.call(rbind, holders)
z_combined_cost_dtrmnt <-subset(z_combined_cost_dtrmnt, !is.na(z_combined_cost_dtrmnt$prd_num_of_days_num))

This runs well right now but I just learned that I actually need to sort by val_lvl2 and val_lvl3 to get the unique values of my data before I can continue further manipulation. So what I'm trying to do is this essentially

holders <- split(z_combined_cost_dtrmnt, z_combined_cost_dtrmnt$val_lvl2 & z_combined_cost_dtrmnt$val_lvl3 )

Now this isn't running in my compiler now but I was wondering if this was possible in some kind of other way?

Current output:

 Upper GI Endoscopy with Biopsy                                            :'data.frame':     292 obs. of  22 variables:
  ..$ mcp_cat_name                 : chr [1:292] "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" ...
  ..$ pln_name                     : chr [1:292] "AR" "AR" "AR" "AR" ...
  ..$ hosp_refl_rgn_name           : chr [1:292] "Fort Smith, AR" "Fort Smith, AR" "Jonesboro, AR" "Jonesboro, AR" ...
  ..$ val_lvl1                     : chr [1:292] "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" ...
  ..$ val_lvl2                     : chr [1:292] "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" ...
  ..$ val_lvl3                     : chr [1:292] "Outpatient Hospital" "Surgical Center" "Outpatient Hospital" "Surgical Center" ...

Expected output:

 Upper GI Endoscopy with Biopsy                                            :'data.frame':     146 obs. of  22 variables:
  ..$ mcp_cat_name                 : chr [1:146] "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" ...
  ..$ pln_name                     : chr [1:146] "AR" "AR" "AR" "AR" ...
  ..$ hosp_refl_rgn_name           : chr [1:146] "Fort Smith, AR" "Fort Smith, AR" "Jonesboro, AR" "Jonesboro, AR" ...
  ..$ val_lvl1                     : chr [1:146] "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" ...
  ..$ val_lvl2                     : chr [1:146] "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" ...
  ..$ val_lvl3                     : chr [1:146] "Outpatient Hospital" "Outpatient Hospital" "Outpatient Hospital" ...


Upper GI Endoscopy with Biopsy                                            :'data.frame':     146 obs. of  22 variables:
  ..$ mcp_cat_name                 : chr [1:146] "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" "Digestive Conditions" ...
  ..$ pln_name                     : chr [1:146] "AR" "AR" "AR" "AR" ...
  ..$ hosp_refl_rgn_name           : chr [1:146] "Fort Smith, AR" "Fort Smith, AR" "Jonesboro, AR" "Jonesboro, AR" ...
  ..$ val_lvl1                     : chr [1:146] "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" "Endoscopic Procedures" ...
  ..$ val_lvl2                     : chr [1:146] "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" "Upper GI Endoscopy with Biopsy" ...
  ..$ val_lvl3                     : chr [1:146] "Surgical Center" "Surgical Center" "Surgical Center" "Surgical Center" ...

SAMPLE DATA: This was created using the following code... dput(head (z_combined_cost_dtrmnt, 50))

dput(head (z_combined_cost_dtrmnt, 50))
structure(list(mcp_cat_name = c("Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions",
"Back and Neck Conditions", "Back and Neck Conditions", "Back and Neck Conditions"
), pln_name = c("AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR",
"AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR",
"AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR", "AR",
"CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA",
"CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA"), hosp_refl_rgn_name = c("Fort Smith, AR",
"Fort Smith, AR", "Fort Smith, AR", "Fort Smith, AR", "Fort Smith, AR",
"Fort Smith, AR", "Jonesboro, AR", "Jonesboro, AR", "Jonesboro, AR",
"Jonesboro, AR", "Jonesboro, AR", "Jonesboro, AR", "Little Rock, AR",
"Little Rock, AR", "Little Rock, AR", "Little Rock, AR", "Little Rock, AR",
"Little Rock, AR", "Springdale, AR", "Springdale, AR", "Springdale, AR",
"Springdale, AR", "Springdale, AR", "Springdale, AR", "Texarkana, AR",
"Texarkana, AR", "Texarkana, AR", "Texarkana, AR", "Texarkana, AR",
"Texarkana, AR", "Alameda County, CA", "Alameda County, CA",
"Alameda County, CA", "Alameda County, CA", "Bakersfield, CA",
"Bakersfield, CA", "Bakersfield, CA", "Bakersfield, CA", "Chico, CA",
"Chico, CA", "Chico, CA", "Contra Costa County, CA", "Contra Costa County, CA",
"Contra Costa County, CA", "Contra Costa County, CA", "Fresno, CA",
"Fresno, CA", "Fresno, CA", "Fresno, CA", "Los Angeles, CA"),
    val_lvl1 = c("Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Lumbar (Low Back) Pain",
    "Lumbar (Low Back) Pain", "Neuritis", "Cervical (Neck) Pain",
    "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain", "Neuritis",
    "Cervical (Neck) Pain", "Lumbar (Low Back) Pain", "Neuritis",
    "Cervical (Neck) Pain", "Lumbar (Low Back) Pain", "Lumbar (Low Back) Pain",
    "Neuritis", "Cervical (Neck) Pain", "Lumbar (Low Back) Pain",
    "Lumbar (Low Back) Pain", "Neuritis", "Cervical (Neck) Pain"
    ), val_lvl2 = c("Cervical Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Lumbar Laminectomy", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Cervical Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Lumbar Laminectomy", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Cervical Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Lumbar Laminectomy", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Cervical Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Lumbar Laminectomy", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Cervical Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Lumbar Laminectomy", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Lumbar Fusion (Spinal Fusion)",
    "Non-Surgical Treatment", "Non-Surgical Treatment", "Non-Surgical Treatment",
    "Lumbar Fusion (Spinal Fusion)", "Non-Surgical Treatment",
    "Non-Surgical Treatment", "Non-Surgical Treatment"), val_lvl3 = c("Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Inpatient Hospital",
    "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Outpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain",
    "Inpatient Hospital", "Alternative to Surgical Treatment of Lumbar (Low Back) Pain",
    "Alternative to Surgical Treatment of Neuritis", "Alternative to Surgical Treatment of Cervical (Neck) Pain"
    ), val_lvl4 = c("", "", "", "", "", "", "", "", "", "", "",
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
    "", "", "", "", "", "", "", "", ""), ntwk_avg_low_range_billed_amt = c(80359,
    156, 107300, 51324, 156, 156, 80273, 139, 107333, 51287,
    139, 139, 80351, 151, 107334, 51343, 151, 151, 80270, 148,
    107192, 51146, 148, 148, 80388, 165, 107375, 51381, 165,
    165, 215, 140194, 215, 215, 171, 140051, 171, 171, 158, 158,
    158, 205, 140267, 205, 205, 171, 140318, 171, 171, 205),
    ntwk_avg_low_range_alwd_amt = c(36707, 116, 53412, 19115,
    116, 116, 36700, 126, 53476, 19120, 126, 126, 36681, 121,
    53412, 19060, 121, 121, 36677, 125, 53375, 19018, 125, 125,
    36741, 135, 53475, 19143, 135, 135, 164, 58285, 164, 164,
    111, 58046, 111, 111, 111, 111, 111, 147, 58277, 147, 147,
    117, 58131, 117, 117, 130), ntwk_avg_avg_billed_amt = c(99032,
    554, 139522, 51324, 554, 554, 98926, 495, 139566, 51287,
    495, 495, 99021, 538, 139568, 51343, 538, 538, 98922, 526,
    139383, 51146, 526, 526, 99067, 585, 139621, 51381, 585,
    585, 693, 140194, 693, 693, 551, 140051, 551, 551, 512, 512,
    512, 662, 140267, 662, 662, 553, 140318, 553, 553, 661),
    ntwk_avg_avg_alwd_amt = c(41040, 313, 57902, 19115, 313,
    313, 41033, 340, 57972, 19120, 340, 340, 41011, 326, 57902,
    19060, 326, 326, 41007, 338, 57862, 19018, 338, 338, 41079,
    365, 57970, 19143, 365, 365, 451, 58285, 451, 451, 306, 58046,
    306, 306, 305, 305, 305, 403, 58277, 403, 403, 320, 58131,
    320, 320, 356), ntwk_avg_hi_range_billed_amt = c(104618,
    559, 171745, 51324, 559, 559, 104506, 500, 171800, 51287,
    500, 500, 104607, 543, 171801, 51343, 543, 543, 104502, 532,
    171574, 51146, 532, 532, 104655, 591, 171867, 51381, 591,
    591, 799, 140194, 799, 799, 635, 140051, 635, 635, 590, 590,
    590, 764, 140267, 764, 764, 638, 140318, 638, 638, 762),
    ntwk_avg_hi_range_alwd_amt = c(46388, 318, 62393, 19115,
    318, 318, 46380, 345, 62467, 19120, 345, 345, 46355, 331,
    62393, 19060, 331, 331, 46351, 343, 62349, 19018, 343, 343,
    46432, 371, 62466, 19143, 371, 371, 537, 58285, 537, 537,
    365, 58046, 365, 365, 364, 364, 364, 481, 58277, 481, 481,
    382, 58131, 382, 382, 424), episode_count = c(5L, 284L, 2L,
    1L, 284L, 284L, 5L, 284L, 2L, 1L, 284L, 284L, 5L, 284L, 2L,
    1L, 284L, 284L, 5L, 284L, 2L, 1L, 284L, 284L, 5L, 284L, 2L,
    1L, 284L, 284L, 148L, 1L, 148L, 148L, 148L, 1L, 148L, 148L,
    148L, 148L, 148L, 148L, 1L, 148L, 148L, 148L, 1L, 148L, 148L,
    148L), sample_size = c(12.7788970978329, 326.969758402962,
    3.25471779465034, NA, 326.969758402962, 326.969758402962,
    12.7788970978329, 326.969758402962, 3.25471779465034, NA,
    326.969758402962, 326.969758402962, 12.7788970978329, 326.969758402962,
    3.25471779465034, NA, 326.969758402962, 326.969758402962,
    12.7788970978329, 326.969758402962, 3.25471779465034, NA,
    326.969758402962, 326.969758402962, 12.7788970978329, 326.969758402962,
    3.25471779465034, NA, 326.969758402962, 326.969758402962,
    282.202307833077, NA, 282.202307833077, 282.202307833077,
    282.202307833077, NA, 282.202307833077, 282.202307833077,
    282.202307833077, 282.202307833077, 282.202307833077, 282.202307833077,
    NA, 282.202307833077, 282.202307833077, 282.202307833077,
    NA, 282.202307833077, 282.202307833077, 282.202307833077),
    in_map = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA), in_map.x = c(NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA), in_trmnt = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), in_map.y = c(NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA), in_complete = c(NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
    in_miss = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA, NA, NA, NA), prd_num_of_days_num = c(167,
    46, 117, 209, 46, 46, 167, 46, 117, 209, 46, 46, 167, 46,
    117, 209, 46, 46, 167, 46, 117, 209, 46, 46, 167, 46, 117,
    209, 46, 46, 38, 339, 38, 38, 38, 339, 38, 38, 38, 38, 38,
    38, 339, 38, 38, 38, 339, 38, 38, 38)), .Names = c("mcp_cat_name",
"pln_name", "hosp_refl_rgn_name", "val_lvl1", "val_lvl2", "val_lvl3",
"val_lvl4", "ntwk_avg_low_range_billed_amt", "ntwk_avg_low_range_alwd_amt",
"ntwk_avg_avg_billed_amt", "ntwk_avg_avg_alwd_amt", "ntwk_avg_hi_range_billed_amt",
"ntwk_avg_hi_range_alwd_amt", "episode_count", "sample_size",
"in_map", "in_map.x", "in_trmnt", "in_map.y", "in_complete",
"in_miss", "prd_num_of_days_num"), row.names = c(NA, 50L), class = "data.frame")
Community
  • 1
  • 1
nazgulian
  • 15
  • 4
  • 1
    The docs say you can pass `f=` a list of factors. – Frank Aug 03 '17 at 21:53
  • @Frank Could you maybe elaborate a little more? I don't exactly follow what you mean from that comment! – nazgulian Aug 03 '17 at 21:56
  • `split(DF, list(DF$COL1, DF$COL2))` or similar should work. The docs can be read by typing `?split`. – Frank Aug 03 '17 at 21:57
  • Hmm the resulting list from that ends up deleting all of the information inside. I would show you if it wasn't so long :(. But, val_lvl2 and val_lvl3 hold different values that are unique. Basically val_lvl2 holds the type of treatment I.E knee surgery, elbow surgery, etc... Val_lvl3 holds where it happened so hospital, surgical center, etc... I essentially want all the knee surgeries that happened in the hospital grouped together and the knee surgeries that happened at a surgical center to be a totally different group as well. Does that sort of make sense? – nazgulian Aug 03 '17 at 22:04
  • Ok, you'll probably need to make a proper example with expected output: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank Aug 03 '17 at 22:07
  • 1
    got it I'll do that right now! Thanks for all the feedback! – nazgulian Aug 03 '17 at 22:11
  • @Frank I have added the output I get now and output I expect from answering the question! The variable to look out for is val_lvl3. Also essentially the numbers should be 146 instead of 292 afterwards. – nazgulian Aug 03 '17 at 22:19
  • Hm, I am afraid I still don't get it. BrodieG's post in my link explains what it means to make a good reproducible example. The example you need when asking a question might be quite a lot simpler than your true use case. It usually takes some work to construct such an example and often you'll find the answer along the way when building it. You could also look at general SO guidance: [mcve] – Frank Aug 03 '17 at 22:41
  • 1
    @Frank thanks for the feedback I'll try to get more understandable example up with those guidelines! – nazgulian Aug 03 '17 at 22:46
  • 1
    Also, remember with `split` that a list of factors can be a data.frame subset (which is a list), so `split(DF, DF[c("COL1","COL2")])` also works. – thelatemail Aug 03 '17 at 23:42
  • @thelatemail that also produces an empty dataframe afterwards ;\ – nazgulian Aug 07 '17 at 19:16
  • @nazgulian it might for your particular dataset, but I assure you the concept is fine. – thelatemail Aug 07 '17 at 20:52

1 Answers1

2

Hard to answer without example data, but you could try

split(z_combined_cost_dtrmnt, 
  interaction(
    z_combined_cost_dtrmnt$val_lvl2, 
    z_combined_cost_dtrmnt$val_lvl3
  )
)

interaction creates a new factor that is the combination of the lvl2 and lvl3 factors, so it should split the data by unique factor combinations. I would expect this to be equivalent to

split(z_combined_cost_dtrmnt, 
  f = list(
    z_combined_cost_dtrmnt$val_lvl2, 
    z_combined_cost_dtrmnt$val_lvl3
  )
)
mikeck
  • 3,534
  • 1
  • 26
  • 39
  • is what's above not sufficient enough? I did just edit to have an example of what I'm looking for if you by chance missed it! The method you just gave me also unfortunately just eliminates all the data as well ;( – nazgulian Aug 03 '17 at 22:29
  • @nazgulian, It's not sufficient because there is no example data for us to look at. If `split` isn't working for you it might be something to do with how your data is formatted. I tested this solution on one of my own datasets and it works as expected. – mikeck Aug 03 '17 at 22:32
  • would a str(z_combined_cost_dtrmnt) be sufficient? I definitely don't mind providing more information! Just not sure what exactly needs to be shared from my side sorry :( – nazgulian Aug 03 '17 at 22:34
  • 1
    The output of `dput(z_combined_cost_dtrmnt)` allows us to recreate the dataset in our own R terminal. If the dataset is large, `dput(head (z_combined_cost_dtrmnt, 50))` or similar will suffice. The example data needs to capture enough of the variation in the factors of interest in order to adequately characterize the issue. – mikeck Aug 03 '17 at 22:52
  • @nazgulian, I tested my answer against your dataset and it works. It creates a list of dataframes where each dataframe contains rows for a unique interaction of `val_lvl2` and `val_lv3`. Some of the list elements are 0-row dataframes because not every possible combination of `val_lvl2` and `val_lv3` occurs in the dataset. However, the total number of rows in the list of dataframes matches the number of rows in the test dataset, which means no data is missing from the resulting list. You can ignore the empty data frames with argument `drop = TRUE`. – mikeck Aug 07 '17 at 19:26
  • That is definitely my bad! All I was seeing were the empty ones thank you for that! – nazgulian Aug 07 '17 at 20:18