4

Say, I have vocabulary (around 100000 words) and word ("inputstring"). So:

I need generate all words from "inputstring" like "input", "string", "put", "strinpg" etc. And then I need check out them in the my vocabulary. Could you say any good algorithm for that? Because I have only idea with:

  1. Recursive search all possible combinations at the step 1
  2. Using NSPredicates for filter them in the my vocabulary.
Jakub
  • 13,712
  • 17
  • 82
  • 139
Viktorianec
  • 361
  • 6
  • 22
  • 1
    This question needs to be articulated more clearly, it is not very obvious what you are saying. Also, you should say what you have tried already. – Max von Hippel Jul 22 '15 at 22:23
  • Ok. I have a big word. Then I am getting all words from it recursively, but it's too slow. So I need any good decision for getting list of words with any other algorithm. – Viktorianec Jul 22 '15 at 23:38
  • Does the word have to have alphabets in same order as input string? or order can change? e.g. the one you mentioned strinpg, i think that is a mistake right? – Omar Jul 28 '15 at 10:35
  • Order can change, of course. "stringpg" is sample of possible word... – Viktorianec Jul 28 '15 at 21:42
  • On c# I do this frequently with Regex, but whatever you end up doing, you may just be facing the intrinsic frustration of PN problems. You will end up needing a lot of iterations no matter what you do. – Max von Hippel Jul 29 '15 at 18:10

5 Answers5

5

I tried with NSRegularExpression since CoreData & NSPredicate seems to manage them, but I couldn't have a working solution (maybe linked to my no expertise in Regex, but could be a lead). I tried also with NSCharacterSet, but it couldn't say it the number of occurrences was correct..

This maybe not the more sexy way to do it, but, here what you could do:

NSString *searchedWord = @"inputString";

NSPredicate *predicate = [NSPredicate predicateWithBlock:^BOOL(NSString *evaluatedObject, NSDictionary *bindings) {
    for (NSUInteger index = 0; index < [evaluatedObject length]; index++)
    {
        NSString *subString = [evaluatedObject substringWithRange:NSMakeRange(index, 1)];

        NSUInteger numberOfOccurrencesInSearchWord = [self occurrencesOfSubString:subString inString:searchedWord];
        NSUInteger numberOfOccurrencesInCurrentWord = [self occurrencesOfSubString:subString inString:evaluatedObject];
        if (numberOfOccurrencesInCurrentWord > numberOfOccurrencesInSearchWord)
            return FALSE;
    }
    return TRUE;
}];

//Apply this predicate to your fetch

I put occurrencesOfSubString:inString: in the class, but it could be a Category on NSString for example. You could also loop with rangeOfString:option:range if you prefer them to NSRegularExpression. Source of the code (slightly modified)

-(NSUInteger)occurrencesOfSubString:(NSString *)subString inString:(NSString *)string
{
    NSUInteger numberOfMatches = 0;
    NSError *error = nil;
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:subString
                                                                           options:NSRegularExpressionCaseInsensitive error:&error];


    if (!error)
        numberOfMatches = [regex numberOfMatchesInString:string options:0 range:NSMakeRange(0, [string length])];

    return numberOfMatches;
}

Note: To avoid too much loops, you may want to strip the evaluatedObject in order to not check repeated value. For example, if evaluatedObject = @"aaa", it will look 3 times for "a". So removing duplicated values in it may improve the speed. Here is a solution. So the code would be in the predicate block:

NSString *evaluatedWithoutRepeat = [evaluatedObject removeDuplicatedCharacters];
for (NSUInteger index = 0; index <= [evaluatedWithoutRepeat length]; index ++)
{
    NSString *subString = [evaluatedWithoutRepeat substringWithRange:NSMakeRange:(index,1)];
    //The rest would be the same.
}

WorkingTest:

NSArray *testValues = @[@"inputString",
                        @"input",
                        @"string",
                        @"put",
                        @"strinpg",
                        @"Stringpg",
                        @"stringNOTWANTED"];
NSLog(@"AllValues: %@", testValues);

NSLog(@"Test: %@", [testValues filteredArrayUsingPredicate:predicate]);

Output:

> AllValues: (
    inputString,
    input,
    string,
    put,
    strinpg,
    Stringpg,
    stringNOTWANTED
)
> Test: (
    inputString,
    input,
    string,
    put,
    strinpg
)
Community
  • 1
  • 1
Larme
  • 24,190
  • 6
  • 51
  • 81
3

It sounds like you want to insert your vocabulary list into a trie. This will give you a datastructure that you can then quickly check to find all substrings in your input that are present in your vocabulary.

Assuming that you are building the trie once and checking lots of different input strings, this will be a lot quicker than starting by combinatorially finding all substrings of your input. (This speed comes at the cost of memory for the trie.)

Ewan Mellor
  • 6,747
  • 1
  • 24
  • 39
3

You're on the right track with NSPredicate. And the phase you're looking for is fault tolerant search and it solved by Levenshtein distance. What you basically need to do is make an || combination with queries in single queries.

Let's assume you have all your words in NSArray. You need to call a method filteredArrayUsingPredicate: on it, BUT it's not be super easy to build a predicate like that.

So your requirements are:

  1. Search word can be a part of larger word
  2. User can misspelled word

First part are pretty easy, all you need to do is put CONTAINS to your predicate. The second part should be like ?tring or s?ring or st?ing... and can easily be build with simple for. You can experiment with various number of ? signs and see what matches your criteria.

Jakub
  • 13,712
  • 17
  • 82
  • 139
0

I am not sure if there is a special algorithm to solve your problem. But the possibilities are limited if you have to solve it with Core Data Fetch Requests.. I would do it like this :

- (NSArray *)getWordsFromString:(NSString *)input{

   NSMutableArray *result = [NSMutableArray new];
   NSUInteger *startIndex = 0;

   for (NSUInteger i = 0; i < input.length ; i++){
       NSString *substring = [input substringWithRange:NSMakeRange(*startIndex, i)];

      NSPredicate *predicate = [NSPredicate predicateWithFormat:@"word == %@", substring]; 
      NSFetchRequest *fetchRequest = [NSFetchRequest fetchRequestWithEntityName:@"Word"];
      fetchRequest.predicate = predicate
      [fetchRequest setIncludesPropertyValues:NO];
      [fetchRequest setIncludesSubentities:NO];
      NSArray *fetchResult = fetch result with predicate

       if (fetchResult.count > 0){
           [result addObject:substring];
           startIndex = i;
       }
   } 

   return result;
}
Lennet
  • 413
  • 3
  • 10
0
NSMutableArray *foundWords = [NSMutableArray new];
for (NSString *knownWord in vocabulary)
{
    if ([input rangeOfString:knownWord].location != NSNotFound)
    {
        [foundWords addObject:knownWord];
    }
}

You can should it by preparing vocabulary. You should include only words that start with letters that are contained by input word.

Roman Temchenko
  • 1,796
  • 1
  • 13
  • 17