This is sample of one document in my mongodb collection page_link_titles:
{
"_id" : ObjectId("553b11f30b81511d64152416"),
"id" : 36470831,
"linkTitles" : [
"Syrian civil war",
"Damascus",
"Geographic coordinate system",
"Bashar al-Assad",
"Al Jazeera English",
"Free Syrian Army",
...
"February 2012 Aleppo bombings",
"2012 Deir ez-Zor bombing",
"Aleppo University bombings"
]
}
I want to find all the documents that the text in their linkTitles
contains a phrase like '%term1%'
or '%term2%'
or (so on). term1 and term2 must have a line break in both sides. For example looking into "Syrian civil war"
. If term1 = "war"
I want this document to be returned as the result of query, however if term1 = "yria"
which is a part of a word in this document, it shouldn't be returned.
This is my java code:
for (String term : segment.terms) {
DBObject clause1 = new BasicDBObject("linkTitles",
java.util.regex.Pattern.compile("\\b"
+ stprocess.singularize(term) + "\\b"));
or.add(clause1);
}
DBObject mongoQuery = new BasicDBObject("$or", or);
DBCursor cursor = pageLinks.find(mongoQuery);
In line: java.util.regex.Pattern.compile("\\b"+ stprocess.singularize(term) + "\\b"));
I only assumed line break. I don't know how I should write the regex to consider all my conditions : line break, case insensitive, like.
Any ideas?