1

TL;DR

How do I return strings that contain a substring, but only if is not surrounded by letters?

CONTEXT

I'm creating a translation tool to a fictional language. I have a lexicon stored in a JSON object and am using the following code to translate each word from the input to another word for the output:

    //loop through lexicon and if inputted word is in values, return corresponding key -> NEED A WAY TO GET 
    for(var word in inputArray){    
        for(var key in lexicon){
            var value = lexicon[key];
            var term = inputArray[word];

            // check if the term appears as a value in the selected lexicon
            if(value.indexOf(term) !== -1){
                // if key contains commas, then take only what's before the first comma
                if(key.indexOf(',') !== -1){
                    match = key.substring(0, key.indexOf(','));
                // if there are no commas, return the whole key
                } else{
                    match = key;
                }
                outputArray.push(match)
                break;
            }
        }
    }

PROBLEM

Although this method works for longer words, shorter words such as "hi" will be matched to the first value found in the lexicon, which could be another word, such as "thing". Since some of the object values contain comma-separated words, I need a way to pull only words which are NOT surrounded by letters.

DESIRED RESULT

The translation of "thing" will not be returned if I type "hi", but the translation of "hi,hello,goodday" or "hello,hi" will be returned.

Sekoul
  • 1,361
  • 2
  • 14
  • 30
  • 2
    Use `\b` - word boundaries - in a regex pattern. If you just search for whole words, it is enough. Your desired result is unclear: there is no `hi` in `hello`. – Wiktor Stribiżew Jan 19 '16 at 22:15
  • sounds like you want `if(match.endsWith(key)){ ...` (after the left-side-of-comma chop) – dandavis Jan 19 '16 at 22:24
  • @WiktorStribiżew - sorry if it was unclear, but the keys/values contain more than one word. The translation (i.e. value) of the "hello,hi" key would be returned because it is a single key composed of two words separated by a comma. I hope that makes more sense! – Sekoul Jan 20 '16 at 13:31
  • 1
    Could you provide an [MVCE (minimal complete verifiable example)](http://stackoverflow.com/help/mcve)? I just need to know which variable contains what text. – Wiktor Stribiżew Jan 20 '16 at 14:22
  • Just got it to work while creating an MVCE for you :) - thanks a lot for your help, much appreciated! I ended up using the solution below to define the term to search for as a RegEx (`var term = new RegExp("\\b" + inputArray[word] + "\\b")`), and then my if statement is `if(term.test(value)){}` – Sekoul Jan 20 '16 at 14:48

1 Answers1

2

Do you want to match the whole word in a string? You may use regular expressions for this purpose:

new RegExp("\\b" + lookup + "\\b").test(text)

See here for more details: whole word match in javascript

Community
  • 1
  • 1
advncd
  • 3,787
  • 1
  • 25
  • 31
  • 1
    If you need to use `RegExp#test()`, you cannot declare the regex with `/g` modifier. You should replace `new RegExp("\\b" + lookup + "\\b", "g").test(text)` with `new RegExp("\\b" + lookup + "\\b").test(text)`. – Wiktor Stribiżew Jan 20 '16 at 13:33
  • @WiktorStribiżew - I tried using `var term = new RegE@xp("\\b" + inputArray[word] + "\\b")` and then still the same `if(value.indexOf(term) !== -1){` as above, but that does not seem to work - any ideas? – Sekoul Jan 20 '16 at 14:21
  • Edited the answer to reflect the comments – advncd Jan 20 '16 at 18:23