1

After digging google and SO for a week I've ended up asking the question here. Suppose there are two collections,

UsersCollection:

[
{...
    name:"James"
    userregex: "a|regex|str|here"
},
{...
    name:"James"
    userregex: "another|regex|string|there"
},
...
]

PostCollection:

[
{...
    title:"a string here ..."
},
{...
    title: "another string here ..."
},
...
]

I need to get all users whose userregex will match any post.title(Need user_id, post_id groups or something similar).

What I've tried so far:
1. Get all users in collection, run regex on all products, works but too dirty! it'll have to execute a query for each user 2. Same as above, but using a foreach in Mongo query, it's the same as above but only Database layer instead of application layer

I searched alot for available methods such as aggregations, upwind etc with no luck.
So is it possible to do this in Mongo? Should i change my database type? if yes what type would be good? performance is my first priority. Thanks

Jafar Akhondali
  • 1,552
  • 1
  • 11
  • 25
  • Since this seems to be something that you don't need to do for a specific user, maybe do the expensive calculation (that you can't get around anyway) and cache the result in memory. It sounds like this might be an AB problem though. – Ingo Bürk Oct 04 '17 at 05:34
  • @IngoBürk I cant cache the result, new data comes in PostCollection and UsersCollection get's updated too. Maybe using some flags to work on new\changed documents but it's too dirty – Jafar Akhondali Oct 04 '17 at 21:00
  • Adding posts will not a change the cached result. Only removing them will, and that you could account for by memorizing which post was matched for a user (if that's the removed one, recalculate for the affected users). And adding users you only have to calculate the information for a single user and add it to the cache. – Ingo Bürk Oct 04 '17 at 21:06
  • You have done something like https://stackoverflow.com/a/22739813/4110233 right? – TheChetan Oct 06 '17 at 03:34
  • Are your regexs all just piped "or" of single keywords like in your example? If so, you might be able to just store them in an array and use the aggregation framework to do a $lookup. – Nic Cottrell Oct 09 '17 at 08:21
  • @NicCottrell They are in an array, actually i combined them to make question simpler, Can you provide an answer with using lookup? – Jafar Akhondali Oct 09 '17 at 10:09
  • @NicCottrell Bounty will end in 2Days, 'userregex' is actually created in sotftware layer with triggering 'on-update' and 'on-create' events of schema to build 'userregex' by joining 'items' array with '|' . I just posted 'userregex' to make question look simpler. If you can provide a working example i'll give you bounty rewards. Thanks – Jafar Akhondali Oct 09 '17 at 14:51

3 Answers3

1

MongoDB is good for your use case but you need to use a approach different from current one. Since you are only concerned about any title matching any post, you can store the last results of such a match. Below is a example code

db.users.find({last_post_id: {$exists: 0}}).forEach(
   function(row) {
       var regex = new RegExp(row['userregex']);
       var found = db.post_collection.findOne({title: regex});
       if (found) {
           post_id = found["post_id"];
           db.users.updateOne({
                 user_id: row["user_id"]
               }, {
                    $set :{ last_post_id:  post_id}
                   }); 
       }
   }
)

What it does is that only filters users which don't have last_post_id set, searches post records for that and sets the last_post_id if a record is found. So after running this, you can return the results like

db.users.find({last_post_id: {$exists: 1}}, {user_id:1, last_post_id:1, _id:0})

The only thing you need to be concerned about is a edit/delete to an existing post. So after every edit/delete, you should just run below, so that all matches for that post id are run again.

post_id_changed = 1
db.users.updateMany({last_post_id: post_id_changed}, {$unset: {last_post_id: 1}})

This will make sure that next time you run the update these users are processed again. The approach does have one drawback that for every user without a matching title, the query for such users would run again and again. Though you can workaround that by using some timestamps or post count check

Also you should make to sure to put index on post_collection.title

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
1

I was thinking that if you pre-tokenized your post titles like this:

{
  "_id": ...
  "title": "Another string there",
  "keywords": [
    "another",
    "string",
    "there"
  ]
}

but unfortunately $lookup requires that foreignField is a single element, so my idea of something like this will not work :( But maybe it will give you another idea?

db.Post.aggregate([
   {$lookup: {
          from: "Users",
          localField: "keywords",
          foreignField: "keywords",
          as: "users"
        }
    },
]))
Nic Cottrell
  • 9,401
  • 7
  • 53
  • 76
1

It is not possible to reference the regex field stored in the document in the regex operator inside match expression.

So it can't be done in mongo side with current structure.

$lookup works well with equality condition. So one alternative ( similar to what Nic suggested ) would be update your post collection to include an extra field called keywords ( array of keyword values it can be searched on ) for each title.

db.users.aggregate([
   {$lookup: {
          from: "posts",
          localField: "userregex",
          foreignField: "keywords",
          as: "posts"
        }
    }
])

The above query will do something like this (works from 3.4).

keywords: { $in: [ userregex.elem1, userregex.elem2, ... ] }.

From the docs

If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (e.g. , , etc.)

It looks like earlier versions ( tested on 3.2 ) will only match if array have same order, values and length of arrays is same.

Sample Input:

Users

db.users.insertMany([
  {
    "name": "James",
    "userregex": [
      "another",
      "here"
    ]
  },
  {
    "name": "John",
    "userregex": [
      "another",
      "string"
    ]
  }
])

Posts

db.posts.insertMany([
  {
    "title": "a string here",
    "keyword": [
      "here"
    ]
  },
  {
    "title": "another string here",
    "keywords": [
      "another",
      "here"
    ]
  },
  {
    "title": "one string here",
    "keywords": [
      "string"
    ]
  }
])

Sample Output:

[
  {
    "name": "James",
    "userregex": [
      "another",
      "here"
    ],
    "posts": [
      {
        "title": "another string here",
        "keywords": [
          "another",
          "here"
        ]
      },
      {
        "title": "a string here",
        "keywords": [
          "here"
        ]
      }
    ]
  },
  {
    "name": "John",
    "userregex": [
      "another",
      "string"
    ],
    "posts": [
      {
        "title": "another string here",
        "keywords": [
          "another",
          "here"
        ]
      },
      {
        "title": "one string here",
        "keywords": [
          "string"
        ]
      }
    ]
  }
]
s7vr
  • 73,656
  • 11
  • 106
  • 127