I have a dataset similar to this. Basically it consists of different pages of word documents indicating the page number and also the full text of the page.
{
"_id": "4b36u6vwkZH16H5vmc24sBfuZk0CRqfP",
"_rev": "1-r5WQDAJPPuUP0oLapZrMiMRd6rOaTIz9",
"FILE_NAME": "sample.doc",
"PAGE_NUM": 1,
"PAGE_FULLTEXT": "hello world",
},
{
"_id": "nDIKw5JUWFWVD8m7HEODMa1vNI5gFEXS",
"_rev": "1-nEp7zsuaneJj2AInyPpeBWDNP90ZGpWQ",
"FILE_NAME": "sample.doc",
"PAGE_NUM": 2,
"PAGE_FULLTEXT": "this is john doe",
},
{
"_id": "vCTlNbNk3X893FkWSYnn87L9j371taYZ",
"_rev": "1-oJPspiBHRPeT99m8VPV9qoDTTBoJ9tVK",
"FILE_NAME": "sample-2.doc",
"PAGE_NUM": 1,
"PAGE_FULLTEXT": "this is another document",
},
{
"_id": "2FSDuaEa5bYtP2l7lEgMnqMnqsZpMJUs",
"_rev": "1-ZQRkvfMluu0NQWYH2FUATuXy9uNtOGyk",
"FILE_NAME": "sample-2.doc",
"PAGE_NUM": 2,
"PAGE_FULLTEXT": "page 2 of sample-2.doc",
},
{
"_id": "RET7G6hUU9zSplgW7FIXWKwIVex2NEmI",
"_rev": "1-mlryGv830RNllPwFT7JDDvJoKXuvxAXD",
"FILE_NAME": "sample-3.doc",
"PAGE_NUM": 1,
"PAGE_FULLTEXT": "hello lionel",
},
{
"_id": "VBL6BJBevcvUc6EsJ68bAjHuGRJ6zvMt",
"_rev": "1-fPIJQHKCB2WitR74l1X8I6TOBMhMeCWF",
"FILE_NAME": "sample-3.doc",
"PAGE_NUM": 2,
"PAGE_FULLTEXT": "page hello 2 of sample-3.doc",
}
So far I was able to do a similar querying with Select Distinct Count by checking one of the posts How do I do the SQL equivalent of "DISTINCT" in CouchDB?
Now the problem is that how would I be able to search through the dataset and then group them by FILE_NAME (output similar when SQL code used is SELECT DISTINCT FILE_NAME WHERE PAGE_FULLTEXT like "%hello%")