Using Indicator Variable

Question

So I am helping a friend out in medical school. I for each month two data points for each individual. I have how long they spend in elective surgeries and how much they spent in ER surgeries. I have set it up as this image. I then imported it to R.

I am looking for the easiest way to set up an indicator variable to compare between ER variables and the ones without. And would also like to create a variable for 2 through 7.

dd <- data.frame(
    Month = c("July", "August", "September", "October", "November", "December", 
              "January", "February", "March", "April", "May", "June"),
    Year = c(14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L),
    X2 = c(3635L, 3497L, 3911L, 4270L, 3954L, 5202L, 3369L, 2125L, 3419L, 4226L, 1828L, 2636L),
    X3 = c(3920L, 4320L, 3275L, 3457L, 3276L, 3461L, 3082L, 1511L, 2244L, 2323L, 1820L, 2152L),
    X4 = c(3760L, 2450L, 4270L, 1672L, 2945L, 3661L, 3628L, 5494L, 4466L, 3440L, 4551L, 6317L),
    X5 = c(3062L, 3074L, 1771L, 4021L, 1632L, 1843L, 4306L, 4249L, 2628L, 3212L, 2910L, 4196L),
    X6 = c(0L, 0L, 0L, 406L, 1499L, 1045L, 807L, 393L, 498L, 1430L, 487L, 315L),
    X7 = c(7518L, 7741L, 7704L, 7292L, 4043L, 5782L, 6176L, 6521L, 4318L, 4933L, 7191L, 7480L),
    X2.ER = c(356L, 1417L, 1775L, 118L, 120L, 730L, 813L, 773L, 0L, 168L, 0L, 839L),
    X3.ER = c(837L, 764L, 1604L, 0L, 79L, 0L, 140L, 605L, 0L, 0L, 522L, 368L),
    X4.ER = c(602L, 686L, 1292L, 156L, 145L, 434L, 342L, 189L, 765L, 476L, 379L, 85L),
    X5.ER = c(0L, 363L, 368L, 0L, 0L, 952L, 0L, 448L, 253L, 0L, 0L, 0L),
    X6.ER = c(856L, 666L, 1528L, 0L, 344L, 222L, 422L, 1339L, 788L, 644L, 415L, 512L),
    X7.ER = c(814L, 1917L, 2738L, 694L, 534L, 880L, 634L, 664L, 130L, 360L, 602L, 780L)
)

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Pictures of data aren't very reproducible. — MrFlick, Feb 11 '20 at 19:39
Images are a really bad way of posting data (or code). Can you post sample data in `dput` format? Please edit **the question** with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. (`df` is the name of your dataset.) Also, can you tell us the meaning of the numbers 2 through 7? — Rui Barradas, Feb 11 '20 at 19:41
Okay, so I am looking at the easiest way to clean this. I already had to do a lot of work to get it like this. For each month, there are two data points for each number 2. The 2 column represents minutes worked on elective cases and the 2 ER represents the number of minutes worked on emergency cases. I want to condense this, so there is an indicator to differentiate the residents level 2 through 6, and one to indicate if it is an emergency or not. — Josh_PL, Feb 11 '20 at 19:42
Guys I am having a hell of a time exporting the data. How do i know this easily and effectively — Josh_PL, Feb 11 '20 at 19:57
Use the ``dput()`` function and copy the console output here. IE: ``dput(mtcars)``. — Gainz, Feb 11 '20 at 19:58
Whenever I do that I just get the structure but not the output — Josh_PL, Feb 11 '20 at 20:00
That's fine because we can copy your code in R and it returns your data frame, try it. — Gainz, Feb 11 '20 at 20:00
Yes it's already better, what kind of output are you looking for and what kind of variable? — Gainz, Feb 11 '20 at 20:18
So I forgot one column. I have an indicator variable listed as Nightfloat where it is a zero up till July 1st, 16 for year. and 1 after that. So basically I want to create a list or dataframe so I can compare both ER and elective data for before and after the Nightfloat came into effect. — Josh_PL, Feb 11 '20 at 21:07

score 0 · Answer 1 · answered Feb 11 '20 at 20:42

If I understand what you are after, you can use a few tidyverse functions to help

dd %>% 
  pivot_longer(-c(Month,Year), "visit") %>% 
  mutate(
    visit_type = if_else(str_detect(visit, "ER"),"ER","NonEr"), 
    visit_number = str_extract(visit, "\\d+")) %>% 
  select(-visit) %>% 
  pivot_wider(names_from=visit_type, values_from=value)

This gives

# A tibble: 72 x 5
   Month   Year visit_number NonEr    ER
   <fct>  <int> <chr>        <int> <int>
 1 July      14 2             3635   356
 2 July      14 3             3920   837
 3 July      14 4             3760   602
 4 July      14 5             3062     0
 5 July      14 6                0   856
 6 July      14 7             7518   814
 7 August    14 2             3497  1417
 ...

Thank you! This is precisely what I was looking for. I'll have to play around with it a little, but this is correct! Thank you! — Josh_PL, Feb 11 '20 at 21:03

Using Indicator Variable

1 Answers1