Edit: there was a typo in my df
creation, with a missing _
on the last value of MediaName
; this is now corrected.
I want to create a new variable TrialId
in a data frame as part of the value of another variable MediaName
depending on the value of a third variable Phase
, and thought I could do that using strsplit
and ifelse
within a dplyr::mutate
as follows:
library(dplyr)
# Creating a simple data frame for the example
df <- data.frame(Phase = c(rep("Familiarisation",8),rep("Test",3)),
MediaName = c("Flip_A1_G1","Reg_B2_S1","Reg_A2_G1","Flip_B1_S1",
"Reg_A1_G2","Flip_B2_S2","Reg_A2_G2","Flip_B1_S2",
"HC_A1L","TC_B1R","RC_BL_2R"))
# Creating a new column
df <- df %>%
mutate(TrialId = ifelse(Phase == "Familiarisation",
sapply(strsplit(MediaName, "_"), "[", 2),
sapply(strsplit(MediaName, "_"), "[", 1)))
The expected result being
> df$TrialId
[1] "A1" "B2" "A2" "B1" "A1" "B2" "A2" "B1" "HC" "TC" "RC"
However this gives me the following error because, I believe, of the strsplit
:
Error in mutate_impl(.data, dots) :
Evaluation error: non-character argument.
I know from this SO question that I can easily solve my issue by defining, in this small example, my data frame as a tibble::data_frame
, without knowing why this solves the issue. I can't do exactly that though as in my actual code df
comes from reading a csv file (with read.csv()
). I have been thinking that using df <- df %>% as_tibble() %>% mutate(...)
would solve the issue in a similar way, but it doesn't (why?).
Is there a way to actually use tibble
even when reading files? Or is there another way of achieving what I need to do, without using strsplit
maybe?
I'm also reading on this other SO question that you can use tidyr::separate
but it isn't doing exactly what I want as I need to keep either the first or second value depending on the value of Phase
.