1

I want to be able to search in my documents by subsequences.

For example

{
  "name": "doc1",
  "sequence": ["a", "b", "d", "g", "k"]
}
{
  "name": "doc2",
  "sequence": ["c", "a", "b", "m", "d"]
}

I want to match multiple items in order. Queries example:

  1. return all documents that have the sequence ["a","b"]. (returns doc1 and doc2)
  2. return all documents that have an "a" and after 3 positions a "d" (return doc2)
  3. return all documents that have a sequence ["b","d","(whatever)", "k"] (return doc1)

I am not sure I can do this with MongoDB. Another solution would be to save sequences as strings instead of arrays and use regular expressions (but I don't like much that solution).

If I can't do it in MongoDB, is there another noSql engine or whatever engine that supports this?

Community
  • 1
  • 1
de3
  • 1,890
  • 5
  • 24
  • 39

2 Answers2

1

As the other answer says, it is not possible in MongoDB to search by order atm.

Materialised paths are quite good for seeking out sequences though: http://docs.mongodb.org/manual/tutorial/model-tree-structures/#model-tree-structures-with-materialized-paths and could work here.

So you would have a second field that has a "path" of your sequence field:

{
  "name": "doc2",
  "seq_path": "c,a,b,m,d",
  "sequence": ["c", "a", "b", "m", "d"]
}

And you could just use a pre-fixed regex (which can use an index) to search:

db.col.find({seq_path:/^c,a,b,m,d$/})

Or to find where documents start with that sequence:

db.col.find({seq_path:/^c,a,b/})

This could be one way around.

Sammaye
  • 43,242
  • 7
  • 104
  • 146
0

I think this is not possible in .

First of all its not easy to get a particual element, but you can with $elementMatch, but as far as I see there is no way to get the values of the neighbors.

I would suggest to use strings. I tried that with a short example and that works well.

#!/usr/local/bin/perl

use strict;
use warnings;
use MongoDB;
use Data::Dumper;

my $client     = MongoDB::Connection->new(host => 'localhost', port => 27017);
my $database   = $client->get_database('oho');
my $documents = $database->get_collection('documents');

$documents->remove();

my $doc1 = {  "name"     => "doc1",
              "sequence" => ["abdgk"]
           };

my $doc2 = {
             "name"      => "doc2",
             "sequence"  => ["cabmd"]
           };

$documents->insert($doc1);
$documents->insert($doc2);

my @case1 = $documents->find( { "sequence" => qr/ab/i } )->all();
print "case 1:" . Dumper \@case1;

my @case2 = $documents->find( { "sequence" => qr/a..d/i } )->all();
print "case 2:" . Dumper \@case2;

my @case3 = $documents->find( { "sequence" => qr/bd.*k/i } )->all();
print "case 3:" . Dumper \@case3;
Community
  • 1
  • 1
smartmeta
  • 1,149
  • 1
  • 17
  • 38