8

I am using NSLinguisticTagger for word stemming. I am able to get a stem words of words in a sentence, but not able to get a stem word for a single word.

Following is the code I am using,

    NSString *stmnt = @"i waited";
    NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace | NSLinguisticTaggerOmitPunctuation | NSLinguisticTaggerJoinNames;

    NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:@[NSLinguisticTagSchemeLemma] options:options];
    tagger.string = stmnt;
    [tagger enumerateTagsInRange:NSMakeRange(0, [stmnt length]) scheme:NSLinguisticTagSchemeLemma options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
        NSString *token = [stmnt substringWithRange:tokenRange];
        NSLog(@"%@: %@", token, tag);
    }];

For this I am getting out correctly as:

i: i
waited: wait

But the above code fails to identify stem word if stmnt = @"waited";

Any help is greatly appreciated

Ab'initio
  • 5,368
  • 4
  • 28
  • 40

3 Answers3

5

Following code worked for me,

NSString *stmt = @"waited";
NSRange stringRange = NSMakeRange(0, stmt.length);
NSDictionary* languageMap = @{@"Latn" : @[@"en"]};
[stmt enumerateLinguisticTagsInRange:stringRange
                                       scheme:NSLinguisticTagSchemeLemma
                                      options:NSLinguisticTaggerOmitWhitespace
                                  orthography:[NSOrthography orthographyWithDominantScript:@"Latn" languageMap:languageMap]
                                   usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) {
                                       // Log info to console for debugging purposes
                                       NSString *currentEntity = [stmt substringWithRange:tokenRange];
                                       NSLog(@"%@ is a %@, tokenRange (%d,%d)",currentEntity,tag,tokenRange.length,tokenRange.location);
                                   }];
Ab'initio
  • 5,368
  • 4
  • 28
  • 40
2

The accepted answer converted to Swift for those who need it:

    let stmt = "waited"
    let options: NSLinguisticTaggerOptions = .OmitWhitespace
    let stringRange = NSMakeRange(0, stmt.length)
    let languageMap = ["Latn":["en"]]
    let orthography = NSOrthography(dominantScript: "Latn", languageMap: languageMap)

    stmt.enumerateLinguisticTagsInRange(
        stringRange,
        scheme: NSLinguisticTagSchemeLemma,
        options: options,
        orthography: orthography)
        { (tag, tokenRange, sentenceRange, _) -> () in
            let currentEntity = stmt.substringWithRange(tokenRange)
            println(">\(currentEntity):\(tag)")
    }
Craig Grummitt
  • 2,945
  • 1
  • 22
  • 34
  • 1
    I got some NSRange not convertible to Range errors, so I just converted the string to NSString first ("let nsstmt : NSString = stmt as NSString") and ran everything using nsstmt. Not sure if there is a better way. – Soferio Jan 31 '16 at 04:36
  • I can confirm that stemming fails for a single word when using the `String` method, but works as expected (as least on the plurals I've tried) using the `NSString` equivalent. Bizarre! Also getting errors with the block based `enumerateTags(in:scheme:options:using:)` but the `linguisticTags(in:)` alternative is working as expected. – MathewS Jan 19 '17 at 22:27
  • https://stackoverflow.com/questions/48768919/device-vs-simulator-linguistic-schemes Any chance you can help here? Physical devices not working the same :\ – Will Von Ullrich Feb 18 '18 at 19:02
2

It doesn't work for single word, because there isn't enough information to determine its role in the sentence.

In our case, when user enters single word into our natural language parser, we assume it's a name of a thing, and thus a noun.

So we just construct a sentence where it's implied that the entered word is a noun like so:

let str = "please show me \(word)"

Then just run it through NSLinguisticTagger as usual.

Vojto
  • 6,901
  • 4
  • 27
  • 33
  • https://stackoverflow.com/questions/48768919/device-vs-simulator-linguistic-schemes Any chance you can help here? Physical devices not working the same :\ – Will Von Ullrich Feb 18 '18 at 19:02