I am trying to perform a string operation on one of the columns in my data frame Test_df
. This dataframe has close to 5mil records. The objective is to count the occurrences of a character in a string (after replacing the nulls) and i am using the str_locate
to count.
Since this is a row-wise mutation, i tried using the rowwise()
function with dplyr
.
Test_df <- Test_df%>%
rowwise() %>%
mutate(col1 = replace_na(str_locate(as.character(my_string),"2")[1],999))
This line took more than 5 hours to execute which was extremely sub-optimal.
I then tried using the purrr:pmap
function to speed up the process a little as per this Stack Overflow Thread but this did not help speed up the process.
Test_DF <- Test_DF%>%mutate(col1 = purrr::pmap_dbl(list(Test_DF$my_string), function(a) replace_na(str_locate(a,"2")[1],999)))
Is there a way to do replace_na
and str_locate
so that the execution is faster? I need to do this on a monthly basis.