What is a good alternative to using rowwise() for row-wise operations on a column in a dataframe in R?

Question

I am trying to perform a string operation on one of the columns in my data frame Test_df. This dataframe has close to 5mil records. The objective is to count the occurrences of a character in a string (after replacing the nulls) and i am using the str_locate to count.

Since this is a row-wise mutation, i tried using the rowwise() function with dplyr.

Test_df <- Test_df%>%
        rowwise() %>%
          mutate(col1 = replace_na(str_locate(as.character(my_string),"2")[1],999))

This line took more than 5 hours to execute which was extremely sub-optimal.

I then tried using the purrr:pmap function to speed up the process a little as per this Stack Overflow Thread but this did not help speed up the process.

Test_DF <- Test_DF%>%mutate(col1 = purrr::pmap_dbl(list(Test_DF$my_string), function(a) replace_na(str_locate(a,"2")[1],999)))

Is there a way to do replace_na and str_locate so that the execution is faster? I need to do this on a monthly basis.

If you want to count the number of characters, you can simply use `nchar()` which is already vectorized to you don't need `rowwise()`. — EmilHvitfeldt, Dec 04 '19 at 20:28
Use `str_count` to count. You shouldn't need to use `rowwise` with it. If you need more help, please share a little bit of sample data and the desired result. Though, without seeing your input, I'm not sure why you need to use `rowwise` even with `str_locate`... — Gregor Thomas, Dec 04 '19 at 20:49
@Shakir data table vs dplyr is irrelevant here. There aren't joins or grouped operations. OP is operating on each element of a vector---the only performance question that matters is whether or not that operation can be vectorized. — Gregor Thomas, Dec 05 '19 at 02:38
Yes... I was over complicating things. Thank you everyone for your wise comments. :) — Sumedha Nagpal, Dec 06 '19 at 17:49

What is a good alternative to using rowwise() for row-wise operations on a column in a dataframe in R?

0 Answers0