How to split one SQL column into multiple columns in r

Question

I create a tbl_SQLiteConnection object by dbplyr::memdb_frame() in R.

    library(dbplyr)
    library(dplyr)
    tb=memdb_frame(data.frame(ch=c('1a1','2a3'),cq=c(1,2)))
    tb

# Source:   table<dbplyr_002> [?? x 2]
# Database: sqlite 3.33.0 [:memory:]
  ch       cq
  <chr> <dbl>
1 1a1       1
2 2a3       2

I want to split column ch by a into two columns: ch1 and ch2 like follows:

# Source:   table<dbplyr_004> [?? x 3]
# Database: sqlite 3.33.0 [:memory:]
    ch1   ch2    cq
  <dbl> <dbl> <dbl>
1     1     1     1
2     2     3     2

I try something like this

sq <- paste0('SELECT split_part(ch,"a",1)
             FROM (',sql_render(tb),')')

tbl(tb$src$con,sql(sq))

It does not work!!!

Could you please help me how to conduct this in R ?

Can you add your expected output – Karthik S Apr 09 '21 at 16:15 — Karthik S, Apr 09 '21 at 16:15
Thanks! I have add in the first answer – zhang jing Apr 09 '21 at 16:24 — zhang jing, Apr 09 '21 at 16:24

score 1 · Answer 1 · answered Apr 09 '21 at 16:46

1

Does this work:

library(dplyr)
library(tidyr)
tb %>% separate(col = ch, into = c('ch1','ch2'), sep = 'a')
  ch1 ch2 cq
1   1   1  1
2   2   3  2

answered Apr 09 '21 at 16:46

Karthik S

11,348
2
11
25

This does not work, this command works well for tibble, however it does not work here. `Error in UseMethod("separate") : no applicable method for 'separate' applied to an object of class "c('tbl_SQLiteConnection', 'tbl_dbi', 'tbl_sql', 'tbl_lazy', 'tbl')"` – zhang jing Apr 10 '21 at 00:18
Try `tb %>% collect() %>% separate(...)`? – r2evans Apr 11 '21 at 02:28
Thanks! collect() is the same with as. data. frame，it is not sql but also data. frame – zhang jing Apr 12 '21 at 09:36

G. Grothendieck · Answer 2 · 2021-04-09T17:50:34.353

1

If the question is how to separate the indicated column in a data frame into two using an SQLite select statement in R then the following select statement uses SQLite's instr and substr functions.

library(sqldf)

tb <- data.frame(ch = c('1a1', '2a3'), cq = c(1, 2))

sqldf("
  select 
    substr(ch, 1, instr(ch, 'a') - 1) as ch1,
    substr(ch, instr(ch, 'a') + 1) as ch2,
    cq
  from tb")

giving:

  ch1 ch2 cq
1   1   1  1
2   2   3  2

edited Apr 09 '21 at 17:50

answered Apr 09 '21 at 17:44

G. Grothendieck

254,981
17
203
341

The question is how to split an SQL column, not split one dataframe column by SQL syntax. – zhang jing Apr 10 '21 at 00:16

score 0 · Answer 3 · answered Oct 11 '21 at 20:20

Building on my answer to this question, here are two approaches. Both assume you do not know where in the string the splitting character (in this case 'a') is, and so this is the first step. You can simplify the method if your splitting character is always in the same position (e.g. it is always the second character in the string, as per the example above).

SQL based

We can write an SQL query and pass it to the table definition, following the approach in the question:

query = paste0("
SELECT ch, cq, split_index
    ,SUBSTRING(ch, 1, split_index - 1) AS ch1
    ,SUBSTRING(ch, split_index + 1, LEN(ch)) AS ch2
FROM (

SELECT ch, cq, CHARINDEX('a', ch) AS split_index
FROM ", sql_render(input_tbl), "

) AS subquery"

tbl(input_tbl$src$con, sql(query))

Note that different flavors of SQL may have different names for the SUBSTRING, CHARINDEX, and LEN functions. The functions may also take different arguments (e.g. length of the substring or index of the end of the substring). You will need to ensure you have the right functions for your flavor of SQL.

dbplyr based

dbplyr can translate some standard R and dplyr commands into SQL. This lets you write R and have it automatically translated to SQL.

lubrary(dbplyr)
library(stringr)

input_tbl %>%
  mutate(split_index = str_locate('a', ch) %>%
  mutate(ch1 = substr(ch, 1, split_index - 1),
         ch2 = substr(ch, split_index + 1, nchar(ch)))

Note: str_locate finds the index of the splitting character. In my testing, not every flavor of SQL had a translation defined for this. So this approach may fail depending on your database.

However, you can use the equivalent SQL function directly. Because dbplyr does not have a translation defined for it the function should be passed untranslated to the server.

For example, in SQL server CHARINDEX has the same purpose as str_locate. Hence you could use CHARINDEX('a', ch) instead of str_locate('a', ch).

How to split one SQL column into multiple columns in r

3 Answers3

SQL based

dbplyr based

Linked