2

I have a table called personal_websessions that contains data in the following format:

 id_no | website_link 
 1     | google.com 
 2     | stackoverflow.com 
 3     | msn.com 

You can create this table using the following SQL commands:

CREATE TABLE personal_websessions(id_no INTEGER PRIMARY KEY, website_link TEXT);
INSERT INTO personal_websessions VALUES(1, 'google.com'), (2, 'stackoverflow.com '), (3, 'msn.com ')

I would like to perform a find and replace using regex:

What I would like to do is if the character is 'msn.com' or 'msnnews.com etc (so something with msn in the word) in the website_link column, find that value of 'msn' and replace it with an string 'toast', but if it is not the word msn then leave it as it is. so the example above - google.com and stackoverflow.com will stay the same.

I know that the regex will be of the form (msn) as a grouping structure to match on but I do not know how to write a regex match in Sqlite.

Essentially i will have the following desired output below:

 id_no | website_link 
 1     | google.com 
 2     | stackoverflow.com 
 3     | toast

I am currently using SQlite and I know that I will have to use the REPLACE function as it is can find a pattern and then provide a replacement,

However in this link, they are not using any regex to match the words just defining them

I am really just trying to find out how to use a regex pattern to find and replace values in sqllite.

I am using an RSQLITE connection if that helps.

Thanks

Beans On Toast
  • 903
  • 9
  • 25
  • Perhaps useful: https://stackoverflow.com/a/5071683 – r2evans Jan 26 '20 at 22:11
  • Here's an important question (for me): are you using the R programming language for this? If not, and to remove further confusion, please remove the [tag:r] tag. – r2evans Jan 26 '20 at 22:41
  • I am using the R programming language - and I know how to do this in R, although I am trying to saving resources by using a sqlite db instead of bringing the data back into Ram. I have added a line in the question detailing I am using Rsqlite. – Beans On Toast Jan 27 '20 at 11:04

2 Answers2

3

What you are describing sounds like filtering using like:

update personal_websessions
    set website_link = 'toast'
    where website_link like 'msn%';

In your examples, the "msn" is at the beginning, so I've arranged the like pattern to match that. If you really do mean "msn" anywhere, then the pattern should be '%msn%'.

The function replace() really has nothing to do with this problem. If you want to change the underlying data, then update is the operative command.

EDIT:

If you don't want to change the data but just want a select, then use a case expression:

select pw.id_no,
       (case when pw.website_link like 'msn%' 
             then 'toast'
             else pw.website_link
        end) as website_link
from personal_websessions pw;
Beans On Toast
  • 903
  • 9
  • 25
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • can I use any regex statement after LIKE? so can i use (msn) which is grouping? is that permissable – Beans On Toast Jan 20 '20 at 12:44
  • @Goosfraba . . . (1) SQLite doesn't have regular expressions built-in (although they are easily added), so that seems like a bad idea. (2) You can do this with `like` which is usually a tad more efficient, so regular expressions seem like a bad idea. But, of course you can match the pattern using regular expressions. – Gordon Linoff Jan 20 '20 at 16:41
  • @GordonLinoff, https://www.sqlite.org/lang_expr.html#regexp suggests that there is some regex functionality (at least the possibility of it ...) – r2evans Jan 26 '20 at 22:13
  • @r2evans . . . The documentation is quite clear: "No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message." The grammar supports regular expressions and they are easy to add in, but not part of SQLite *per se*. – Gordon Linoff Jan 26 '20 at 22:27
  • Except since this is R, and `RSQLite` does allow it with `RSQLite::inidRegExp`, it *is* available for this question. That is, unless, the OP really doesn't mean the R programming language when using the [tag:r] tag ... in which case you are right. – r2evans Jan 26 '20 at 22:30
0

You can use SQLite's regexp function, but only after having it registered.

con0 <- DBI::dbConnect(RSQLite::SQLite())
DBI::dbExecute(con0, "CREATE TABLE personal_websessions(id_no INTEGER PRIMARY KEY, website_link TEXT)")
# [1] 0
DBI::dbExecute(con0, "INSERT INTO personal_websessions VALUES(1, 'google.com'), (2, 'stackoverflow.com '), (3, 'msn.com ')")
# [1] 3
DBI::dbExecute(con0, "INSERT INTO personal_websessions VALUES(4, 'msnnews.com')")
# [1] 1
DBI::dbGetQuery(con0, "select * from personal_websessions where website_link like 'msn%'")
#   id_no website_link
# 1     3     msn.com 
# 2     4  msnnews.com
DBI::dbGetQuery(con0, "select * from personal_websessions where website_link regexp '\\bmsn\\b'")
# Error: no such function: regexp
RSQLite::initRegExp(con0)
DBI::dbGetQuery(con0, "select * from personal_websessions where website_link regexp '\\bmsn\\b'")
#   id_no website_link
# 1     3     msn.com 

In order to replace the "msn" with "toast" (within the string, as a substring replacement), though, SQLite does not currently have native support for regex-replacement (short of icu_replace.c, found here).

If you are confident that you will not find "msn" multiple times in one string (e.g., "msnnews.msn.com"), though, you can find with a regex (as above) and then use the non-regex replace. Continuing the above example:

DBI::dbGetQuery(con0, "
  select id_no, replace(website_link,'msn','toast') as website_link
  from personal_websessions
  where website_link regexp '\\bmsn\\b'")
#   id_no website_link
# 1     3   toast.com 

And if you need all rows with just that portion replaced, then a union would work:

DBI::dbGetQuery(con0, "
  select id_no, replace(website_link,'msn','toast') as website_link
  from personal_websessions
  where website_link regexp '\\bmsn\\b'
  union
  select id_no, website_link
  from personal_websessions
  where not website_link regexp '\\bmsn\\b' ")
#   id_no       website_link
# 1     1         google.com
# 2     2 stackoverflow.com 
# 3     3         toast.com 
# 4     4        msnnews.com
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • I have tried this already by initialising the regexp function in R - This function does not allow you to replace only part of a string and not the whole string. For example; I am trying to replace the word "msn" in "msn.com" and "somethingmsn.com" with the word "testing" this would give me "testing.com" and "somethingtesting.com" as the desired results. The regexp function does not perform this - if you want something akin to R this would be the `gsub` function. – Beans On Toast Jan 27 '20 at 11:00
  • Your desired output replaced the entire string, not a substring, which this answer easily supports. Please update your question to indicate what you really mean, perhaps *"replace `msn.com ` with `toast.com `"*. – r2evans Jan 27 '20 at 13:30
  • See my edit ... it's a mostly-good solution, I think, even if it is not perfect. – r2evans Jan 27 '20 at 15:25
  • 1
    Your link concerning `icu_replace.c` (https://config9.com/linux/regex/replace-a-part-of-a-string-with-regexp-in-sqlite3/) seemed dead just now. – PJTraill Jun 13 '22 at 15:07