0

I am still new to R, and I'm running into an issue. I have a file with Raw Data :

dfRawData <-
  data.table(
    "Model" = c(
      "Car1",
      "Car1",
      "Car1",
      "Car2",
      "Car2",
      "Car2",
      "Car3",
      "Car3",
      "Car3"
    ),
    "variable" = c(
      "Metric1",
      "Metric2",
      "Metric3",
      "Metric1",
      "Metric2",
      "Metric3",
      "Metric1",
      "Metric2",
      "Metric3"
    ),
    "valeur" = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
  )

I want to subset this data table based on the name of the car, and the metric. However, I'd like to avoid using if statements, because my code is already very long. To what I've understood, case_when could be very useful. I know that the formula for the subsetted data table is right, since when I use if statement, it returns me what I want. Yet, when I use case_when, I get the following error :

Error in `[.data.frame`(x, i) : undefined columns selected

Does someone know what I'm doing wrong ? Here is my code :

carName = 'Car1' ##Can be changed
dfCarMetric = case_when(
           carName == 'Car1' ~ dfRawData[which(dfRawData[["Model"]] == carName  &
                                               dfRawData[["variable"]] %in% c("Metric1", "Metric2")), ],
           carName == 'Car2' ~ dfRawData[which(dfRawData[["Model"]] %in% c("Car2", "Car3")  &
                                               dfRawData[["variable"]] == "Metric1"), ]
       )

I want to have this in the end :

carName = 'Car1'
    dfCarMetric
       Model variable valeur
    1:  Car1  Metric1      1
    2:  Car1  Metric2      2

carName = 'Car2'
    dfCarMetric
      Model variable valeur
    4  Car2  Metric1      4
    7  Car3  Metric1      7

Thank you very much for your answers !!

  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Do not post pictures of data. We can't copy/paste that for testing. – MrFlick Oct 16 '19 at 21:21
  • Please include a representative sample of your data as plain text - [see here for how](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Data in images cannot be copied/pasted by other users. – neilfws Oct 16 '19 at 21:24

2 Answers2

0

Instead of case_when, which I imagine is from the package dplyr, why don't you try to filter function instead, which is better suited for sub-setting data sets.

library(dplyr)
library(magrittr)

dfRawData <-  data.frame(
    "Model" = c(
      "Car1",
      "Car1",
      "Car1",
      "Car2",
      "Car2",
      "Car2",
      "Car3",
      "Car3",
      "Car3"
    ),
    "variable" = c(
      "Metric1",
      "Metric2",
      "Metric3",
      "Metric1",
      "Metric2",
      "Metric3",
      "Metric1",
      "Metric2",
      "Metric3"
    ),
    "valeur" = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
  )

# Filter
newData <- dfRawData %>% 
  filter((Model =='Car1' & variable %in% c('Metric1', 'Metric2')) |  # Condition 1
                     (Model %in% c('Car2', 'Car3') & variable == 'Metric1')) # Condition 2

This makes an output like the following:

 newData
  Model variable valeur
1  Car1  Metric1      1
2  Car1  Metric2      2
3  Car2  Metric1      4
4  Car3  Metric1      7

You can adjust the filter conditions to get the exact subset that you like, and it should be an easier syntax than case_when.

Using a function


To make this a bit easier and allow you to specify carName, you could wrap this specific filter into a simple (albeit fragile) function:

myFilterFunction <- function(data, carName = 'Car1', metric = c('Metric1', 'Metric2')) {
  data %>%
    filter(Model %in% carName & variable %in% metric)
}

carName = 'Car1'
myFilterFunction(dfRawData, carName = carName, c('Metric1', 'Metric2'))

carName = c('Car2', 'Car3')
myFilterFunction(dfRawData, carName = carName, c('Metric1'))

Which has outputs of:

  Model variable valeur
1  Car1  Metric1      1
2  Car1  Metric2      2

  Model variable valeur
1  Car2  Metric1      4
2  Car3  Metric1      7
al-obrien
  • 1,353
  • 11
  • 29
  • But if I use the `filter` function, my output doesn't depend on the `carName` ... –  Oct 16 '19 at 21:54
  • Please see the edit which includes how to specify `carName` as an input to a function to determine the output desired. – al-obrien Oct 16 '19 at 22:00
0

If you are trying to minimise the amount of conditional statements, you could use them inside of the filter function, also from the dplyr package:

dfCarMetric <- dfRawData %>% 
  filter(
    if (carName == "Car1") 
      Model == carName & variable %in% c("Metric1", "Metric2") 
    else if (carName == "Car2") 
      Model %in% c("Car2", "Car3") & variable == "Metric1")
  )

The case_when function can be used in a similar way, albeit less usual IMHO:

dfCarMetric <- dfRawData %>% 
  filter(case_when(
      carName == "Car1" ~
        Model == carName & variable %in% c("Metric1", "Metric2"), 
      carName == "Car2" ~
        Model %in% c("Car2", "Car3") & variable == "Metric1"
    )
  )
Biblot
  • 695
  • 3
  • 18