0

I have an assignment where I need to retrieve data from some twitter posts using MongoDB, and have been sitting with a problem for a few hours now. I need to extract the mentioned user (In Twitter you @TheirUsername to mention them), and have a hard time doing so, I've tried using substrCP, and finding indexes for where the "@" begins, but can't figure out how to find where the "@" stops, as names have a different length, and there can be any character after the name ends, such as "?", "." etc.

Therefore I was using the regex pattern: /@\w+/ to find out if the tweet has a string of characters that includes an @ symbol, followed by some word. This works really well in finding out if the tweet has an @Someone in it, but I still cannot figure out how to "extract" it.

(Btw. I've been using aggregate to do this, so I could pipe it through $match, then $project, and finally $sort)

Looks something like this:

https://hastebin.com/adohogedil.bash

An example of a string that needs to extract the username is:
"damnnn! @white_cat22 i missed 11:11"

Where I only want the "@white_cat22" part.

EDIT: After googling a bit, I think a better way to describe it is as follows, I need to retrieve the matched regex pattern on the string that is being tested on.

What can I do to extract the mentioned username? Any help would be greatly appreciated! (edited)

ExoMemphiz
  • 179
  • 1
  • 12
  • Have you tried the solutions [from here](https://stackoverflow.com/a/39275122/3832970)? Another [post](https://stackoverflow.com/a/24868492/3832970) also looks helpful. – Wiktor Stribiżew Feb 10 '19 at 11:42

2 Answers2

0

So you can use the MongoDB query operator to achieve what you want such as:

{ username: { $regex: /@white_cat22/i } }

For more details, check out this link

Olatunde Garuba
  • 1,049
  • 1
  • 16
  • 21
0

Its tittle bit tricky, you have to use $split and $unwind operator and then $match with @ as below:

db.tweets.aggregate([ 
    {
        $match: { tweet: /@\w+/ }
    }, 
    {
        $project: {tweet: {$split: ["$tweet", " "]}}
    }, 
    {
        $unwind: "$tweet"
    }, 
    {
        $match: { tweet: /@\w+/  }
    } 
])

It produces the result as, almost similar to your requirement:

{ "_id" : ObjectId("5c61aee91765cd7b27eb473e"), "tweet" : "@white_cat22" }
{ "_id" : ObjectId("5c61aeee1765cd7b27eb473f"), "tweet" : "@white_cat23" }
{ "_id" : ObjectId("5c61aef61765cd7b27eb4740"), "tweet" : "@cat23" }
{ "_id" : ObjectId("5c61aefd1765cd7b27eb4741"), "tweet" : "@KP" }
{ "_id" : ObjectId("5c61af051765cd7b27eb4742"), "tweet" : "@kpTesting" }
{ "_id" : ObjectId("5c61af091765cd7b27eb4743"), "tweet" : "@kpTesting12" }
{ "_id" : ObjectId("5c61b4791765cd7b27eb4744"), "tweet" : "@kpTesting12" }

For more information, my simple find query on above used collection are:

> db.tweets.find()
{ "_id" : ObjectId("5c61aee91765cd7b27eb473e"), "tweet" : "damnnn! @white_cat22 i missed 11:11" }
{ "_id" : ObjectId("5c61aeee1765cd7b27eb473f"), "tweet" : "damnnn! @white_cat23 i missed 11:11" }
{ "_id" : ObjectId("5c61aef61765cd7b27eb4740"), "tweet" : "damnnn! @cat23 i missed 11:11" }
{ "_id" : ObjectId("5c61aefd1765cd7b27eb4741"), "tweet" : "damnnn! @KP i missed 11:11" }
{ "_id" : ObjectId("5c61af051765cd7b27eb4742"), "tweet" : "damnnn! @kpTesting i missed 11:11" }
{ "_id" : ObjectId("5c61af091765cd7b27eb4743"), "tweet" : "damnnn! @kpTesting12 i missed 11:11" }
{ "_id" : ObjectId("5c61b4791765cd7b27eb4744"), "tweet" : "@kpTesting12 i missed 11:11" }
>

It contains the username i.e @ word at first place as well, it will also work if the username present at the last of the tweet sentences.

It might be helpful for, but you can always optimized this query, I am posting here just for your understanding, I am not providing you the optimized solution of what you required.

For more details please check the below reference:

$split (aggregation)

$unwind (aggregation)

krishna Prasad
  • 3,541
  • 1
  • 34
  • 44