1

I am facing a problem using word boundary regex with mongolite. It looks like the word boundary \b does not work, whereas it works in norm MongoDB queries.

Here is a working example:

I create this toy collection:

db.test2.insertMany([
   { item: "journal gouttiere"},
   { item: "notebook goutte"},
   { item: "paper plouf"},
   { item: "planner gouttement"},
   { item: "postcard goutte"}
]);

With mongosh:

db.test2.aggregate(
  {
$match: {
    item: RegExp("\\bgoutte\\b")
  }
 })

returns:

[
  {
    "_id": {
      "$oid": "63206efeb0e1e89db6ef0c20"
    },
    "item": "notebook goutte"
  },
  {
    "_id": {
      "$oid": "63206efeb0e1e89db6ef0c23"
    },
    "item": "postcard goutte"
  }
]

But:

library(mongolite)

connection <- mongo(collection="test2",db="test",
                    url = "mongodb://localhost:27017",
                    verbose = T)

connection$aggregate(pipeline = '[{
      "$match": {
      "item":{"$regex" : "\\bgoutte\\b", "$options" : "i"}
      }
}]',options = '{"allowDiskUse":true}')

returns 0 lines. Changing to

connection$aggregate(pipeline = '[{
      "$match": {
      "item":{"$regex" : "goutte", "$options" : "i"}
      }
}]',options = '{"allowDiskUse":true}')


 Imported 3 records. Simplifying into dataframe...
                       _id               item
1 63206efeb0e1e89db6ef0c20    notebook goutte
2 63206efeb0e1e89db6ef0c22 planner gouttement
3 63206efeb0e1e89db6ef0c23    postcard goutte

It looks like the word boundary regex does not work the same with mongolite. What is the proper solution ?

denis
  • 5,580
  • 1
  • 13
  • 40
  • 3
    I can't test this so I won't make it an answer, but try doubling all backslashes - so `"\\\\bgoutte\\\\b"` – Ottie Sep 15 '22 at 14:06

1 Answers1

3

Ottie is right (and should post an answer!–I'd be fine with deleting mine then):

Backslashes have special meaning for both R and in the regex. You need two additional backslashes (one per \) to pass \\ from R to mongoDB (where you escape \b by \\b), see e.g. this SO question. I just checked:

con <- mongo(
 "test", 
 url = "mongodb+srv://readwrite:test@cluster0-84vdt.mongodb.net/test"
)

con$insert('{"item": "notebook goutte" }')
con$insert('{"item": "postcard goutte" }')

Now

con$aggregate(pipeline = '[{
      "$match": {
      "item":{"$regex" : "\\\\bgoutte\\\\b", "$options" : "i"}
      }
}]',options = '{"allowDiskUse":true}')

yields

                       _id            item
1 63234ac1435f9b7c2a0787c2 notebook goutte
2 63234ac5435f9b7c2a0787c5 postcard goutte
Martin C. Arnold
  • 9,483
  • 1
  • 14
  • 22
  • 4
    It’s ok mate, you take it, you put much more effort into this than me. “More backslashes” is just the default answer whenever a regex that *should* work doesn’t. – Ottie Sep 16 '22 at 07:01
  • Thank you both. I will remember the "More backslashes" tip, for me until then I usually stop at 2 :) – denis Sep 20 '22 at 06:51