0

I'm working with a dataset in R that has missing observations in my vectorFirstOfHCPCS.Code. I want to code those NAs/HCPC codes based on the value in another vector, FirstOfService.Description. Not every NA will be filled with the same value, but rather there are 6 possible values the NA could be coded as. I tried running a loop to fill in the NAs, but I think because I don't have EVERY FirstOfService.Description listed in the loop, R doesn't know what to do with those values. Here is my code for the loop and the resulting error (updated with canary's suggestion):

    for (i in 1:248308){
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65",
      "Local Psychiatric Hospital/IMD PT68", "Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22"))
{Master$FirstOfHCPCS.Code[i]=2}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Inpatient Hospital Ancillary Services - Room and Board",
      "Inpatient Hospital Ancillary Services - Leave of Absence",
      "Inpatient Hospital Ancillary Services - Pharmacy",
      "Inpatient Hospital Ancillary Services - Medical/Surgical Supplies and Devices",
      "Inpatient Hospital Ancillary Services - Laboratory",
      "Inpatient Hospital Ancillary Services -EKG/ECG",
      "Inpatient Hospital Ancillary Services - EEG",
      "Inpatient Hospital Ancillary Services - Psychiatric/Psychological Treatments/Services",
      "Inpatient Hospital Ancillary Services - Other Diagnosis Services",
      "Inpatient Hospital Ancillary Services - Other Therapeutic Services"=="Inpatient Hospital Ancillary Services - Radiology",
      "Inpatient Hospital Ancillary Services - Respiratory Services",
      "Inpatient Hospital Ancillary Services -Physical Therapy",
      "Inpatient Hospital Ancillary Services - Occupational Therapy",
      "Inpatient Hospital Ancillary Services - Speech-Language Pathology",
      "Inpatient Hospital Ancillary Services - Emergency Room",
      "Inpatient Hospital Ancillary Services - Pulmonary Function",
      "Inpatient Hospital Ancillary Services - Audiology",
      "Inpatient Hospital Ancillary Services - Magnetic Resonance Technology (MRT)",
      "Inpatient Hospital Ancillary Services - Pharmacy",
      "Additional Codes-ECT Facility Charge")){Master$FirstOfHCPCS.Code[i]=1}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Pharmacy (Drugs and Other Biologicals)")){Master$FirstOfHCPCS.Code[i]=3}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Crisis Observation Care")){Master$FirstOfHCPCS.Code[i]=4}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Outpatient Partial Hospitalization")){Master$FirstOfHCPCS.Code[i]=5}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Other")){Master$FirstOfHCPCS.Code[i]=6}}

Error in if (is.na(Master$FirstOfHCPCS.Code[i]) & Master$FirstOfService.Description[i] %in%  : 
  argument is of length zero

I also ran sum(is.na(Master$FirstOfHCPCS.Code)) to find out how many rows I have with NA and then replacing the 248308 in the loop code with that number (27186) but I still get the same error as above. How do I fill the NAs with multiple values? Thanks for your help!

Per Request, sample code and desired output (Desired_FirstOfHCPCS.Code)

   ##Sample Code##

FirstOfService.Description<-c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65","Wraparound", "Inpatient Hospital Ancillary Services - Room and Board",
                              "Pharmacy (Drugs and Other Biologicals)","Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22","Case Management","Crisis Observation Care","Outpatient Partial Hospitalization",
                              "Other")
Desired_FirstOfHCPCS.Code<-c(2, 85, 1, 3, 2, 2, 11, 4, 5, 6)

FirstOfHCPCS.Code<-c(NA, 85, NA, NA, NA, NA, 11, NA, NA, NA)

df<-data.frame(FirstOfService.Description, FirstOfHCPCS.Code)

df

Output:

                                    FirstOfService.Description FirstOfHCPCS.Code
1  State Mental Retardation Facility - Inpatient (ICF/MR) PT65                NA
2                                                   Wraparound                85
3       Inpatient Hospital Ancillary Services - Room and Board                NA
4                       Pharmacy (Drugs and Other Biologicals)                NA
5            Local Psychiatric Hospital - Acute Community PT73                NA
6                  State Psychiatric Hospital - Inpatient PT22                NA
7                                              Case Management                11
8                                      Crisis Observation Care                NA
9                           Outpatient Partial Hospitalization                NA
10                                                       Other                NA

What I want it to look like:

#Desired Output
df2<-data.frame(FirstOfService.Description, Desired_FirstOfHCPCS.Code)
df2

                                    FirstOfService.Description Desired_FirstOfHCPCS.Code
1  State Mental Retardation Facility - Inpatient (ICF/MR) PT65                         2
2                                                   Wraparound                        85
3       Inpatient Hospital Ancillary Services - Room and Board                         1
4                       Pharmacy (Drugs and Other Biologicals)                         3
5            Local Psychiatric Hospital - Acute Community PT73                         2
6                  State Psychiatric Hospital - Inpatient PT22                         2
7                                              Case Management                        11
8                                      Crisis Observation Care                         4
9                           Outpatient Partial Hospitalization                         5
10                                                       Other                         6
idemanalyst
  • 137
  • 1
  • 2
  • 8
  • Note that you have to use the `is.na` function to compare `NA` values in R – ialm Jul 12 '13 at 18:18
  • I am quite sure you need neither a `for` loop nor the horrible `if` construct you came up with. Provide [a reproducible example](http://stackoverflow.com/a/5963610/1412059) and show the intended output and someone will show you how to do it with much less code in a more efficient way. You might also be interested in reading `?match`. – Roland Jul 12 '13 at 18:23

1 Answers1

2

First off, it'd be useful to have some reproducible code so we know what you're working with (we don't know what your dataframe consists of).

Otherwise, it looks like there are two problems.

1) You can't use == NA; instead, use is.na().

NA == NA
[1] NA
is.na(NA)
[1] TRUE

2) Another problem is that you're using ANDs rather than ORs. In the first example, your description can't be "State mental retardation facility..." AND "Local psychiatric hospital...".

Instead, try using %in% E.g.,

is.na(Master$FirstOfHCPCS.Code[i]) & 
Master$FirstOfService.Description[i] %in% c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65", "Local Psychiatric Hospital/IMD PT68")

There are quite a few other ways this code could be cleaned up (the for loops and manual assignments are pretty time consuming and error prone here), but there's a start.

canary_in_the_data_mine
  • 2,193
  • 2
  • 24
  • 28
  • When I ran this code, I got this error: `Error in if (is.na(Master$FirstOfHCPCS.Code[i]) & Master$FirstOfSerivce.Description[i] %in% : argument is of length zero` – idemanalyst Jul 12 '13 at 19:14
  • I ran a sample of the loop code, this is the code I ran: `for (i in 1:248308){if(is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfSerivce.Description[i]%in%c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65", "Local Psychiatric Hospital/IMD PT68", "Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22")) {Master$FirstOfHCPCS.Code[i]=2}}` – idemanalyst Jul 12 '13 at 19:16
  • It works fine, except that you misspelled `FirstOfSerivce` in the original code, and I mistakenly copied it. Correct to `FirstOfService`. – canary_in_the_data_mine Jul 12 '13 at 19:34
  • Oops! But even with the changes I'm still receiving the error above. I updated and corrected the code in my original post. – idemanalyst Jul 12 '13 at 19:46
  • Given your code sample and the corrected script (using `df` instead of `Master`), I'm not seeing any errors. When troubleshooting, break each piece to the smallest part. For instance, set `i <- 1`, and just print `is.na(df$FirstOfHCPCS.Code[i])`, then print `df$FirstOfService.Description[i]`, then print `df$FirstOfService.Description[i] %in% c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65", "Local Psychiatric Hospital/IMD PT68")`. Make sure these all return the expected results before trying to run the entire piece of code, and you'll find your error. – canary_in_the_data_mine Jul 12 '13 at 20:08