Building on my answer to this question, here are two approaches. Both assume you do not know where in the string the splitting character (in this case 'a'
) is, and so this is the first step. You can simplify the method if your splitting character is always in the same position (e.g. it is always the second character in the string, as per the example above).
SQL based
We can write an SQL query and pass it to the table definition, following the approach in the question:
query = paste0("
SELECT ch, cq, split_index
,SUBSTRING(ch, 1, split_index - 1) AS ch1
,SUBSTRING(ch, split_index + 1, LEN(ch)) AS ch2
FROM (
SELECT ch, cq, CHARINDEX('a', ch) AS split_index
FROM ", sql_render(input_tbl), "
) AS subquery"
tbl(input_tbl$src$con, sql(query))
Note that different flavors of SQL may have different names for the SUBSTRING
, CHARINDEX
, and LEN
functions. The functions may also take different arguments (e.g. length of the substring or index of the end of the substring). You will need to ensure you have the right functions for your flavor of SQL.
dbplyr based
dbplyr can translate some standard R and dplyr commands into SQL. This lets you write R and have it automatically translated to SQL.
lubrary(dbplyr)
library(stringr)
input_tbl %>%
mutate(split_index = str_locate('a', ch) %>%
mutate(ch1 = substr(ch, 1, split_index - 1),
ch2 = substr(ch, split_index + 1, nchar(ch)))
Note: str_locate
finds the index of the splitting character. In my testing, not every flavor of SQL had a translation defined for this. So this approach may fail depending on your database.
However, you can use the equivalent SQL function directly. Because dbplyr does not have a translation defined for it the function should be passed untranslated to the server.
For example, in SQL server CHARINDEX
has the same purpose as str_locate
. Hence you could use CHARINDEX('a', ch)
instead of str_locate('a', ch)
.