2

Please check this fiddle: https://jsfiddle.net/dp0y4hrw/16/

This is JS to find longest compound word in an array of strings.

Instead of an array, I want this program to read a local txt file, consisting of over 100,000 lines of text. Then find longest compound words. Each line has one word.

I've tried using FileReader to get the data. I was able to correctly pass the data, but gave me trouble with some shared variables between 'addPrefix' and 'findPrefixes'.

I also tried using a promise, to account for the asynchronous behavior:

function readFile(event) {  
  var file = event.target.files[0]; 

  if (file) {   
    new Promise(function(resolve, reject) {      
      var reader = new FileReader();      
      reader.onload = function (evt) {      
      resolve(evt.target.result);      
      };     

      reader.readAsText(file);     
      reader.onerror = reject;   
     })    

     .then(findLongestWord)   
     .catch(function(err) {     
       console.log(err)
     });  
   }
 }

document.getElementById('file').addEventListener('change', readFile, false);

function findLongestWord(data) {
...

This still gives me an issue. What would be the best way to read the file so I can process the contents correctly in this situation?

EDIT:

// adds word as a prefix
var addPrefix = function (word) {
  var i  = 0;
  var current = prefixes;
  var char;

  while (char = word[i++]) {
    if (!current[char]) {
      current[char] = {};
    }
      current = current[char];      
  }
  current.word = true;
  return current.word; //RETURNING CURRENT WORD HERE
};

// Finds the longest prefix we can make using the word.
var findPrefixes = function (word) {
  var prefix = '';
  var current = prefixes;
  var found  = [];
  var i  = 0;
  var char;

  while (char = word[i++]) {
    if (!current[char]) { 
      break; 
    }
    // Move to the next character and add to the prefix.
    current = current[char];
    prefix += char;

    if(current.word)
    {
      found.push(prefix);
    }
  }
  return found;
};

//for each word in list, add to prefix
list.forEach(function (word) {
  var prefix;

// If we can find a closest possible word, it may be possible to create a
// compound word - but we won't be able to check until we reach the end.
if ((prefix = findPrefixes(addPrefix())) && prefix.length) { //FINDPREFIXES USING ADDPREFIX HERE
  prefixMatch.push([ word, prefix ]);

}

// Insert the word into the prefix tree.
addPrefix(word);
});

EDIT 2: This is example of input text file:

cat
cats
catsdogcats
dog
dogcatsdog
hippopotamuses
rat
ratcatdogcat
catratdograt
dogcatscats

Expected result is: longest: ratcatdogcat,catratdograt...2nd longest: catsdogcats,dogcatscats...number of compound words: 5

RJK
  • 226
  • 1
  • 6
  • 22
  • What issue are you having with `javascript` at Question and linked jsfiddle? – guest271314 Sep 19 '16 at 03:43
  • @guest271314 the issue is I want to be able to read external text file data through `readFile(event)` but when I implement it like that, 'current.word' object.key does not get shared between `addPrefix` and `findPrefixes`. So function doesn't work. – RJK Sep 19 '16 at 03:46
  • _" 'current.word' object.key does not get shared between addPrefix and findPrefixes"_ Which function should be called first? What is expected result value of first function which should be called and expected parameter passed to second function? What is expected return value of second function? – guest271314 Sep 19 '16 at 03:49
  • @guest271314 findPrefix is called first, then addPrefix. How it is set up currently, say "word" is passed. The function is stacking objects inside of other objects, and only the last one has the property 'word'. Ex: if your text file only has "word" in it, your `current` variable will be like this: '{"w":{"o":{"r":{"d":{"word":true}}}}}' . That means `current.word` will only be true when at end of the word. That way we can push 'word' into `found` array. – RJK Sep 19 '16 at 03:55
  • What do you pass to `addPrefix`? What is purpose of `var current = prefixes;`? At `addPrefix`? Should `current` be passed to `addPrefix`? – guest271314 Sep 19 '16 at 03:59
  • @guest271314 we pass `addPrefix` each word in text file. Purpose of `var current = prefixes` is to get the object of previously populated prefixes and to create active state of current word. The word should be passed to `addPrefix` and set `current.word = true` at end of word for all functions using `current.word` – RJK Sep 19 '16 at 04:04
  • Which portion of `javascript` is not returning expected result? – guest271314 Sep 19 '16 at 04:07
  • @guest271314 the `javascript` is fine when it comes to reading the text file data. the issue is more about the fact that `current.word` no longer becomes shared between `addPrefix` and `findPrefixes` functions so `findLongestWord` doesn't work properly. – RJK Sep 19 '16 at 04:09
  • Have you tried passing `current.word` from `addPrefix` to `findPrefixes`? – guest271314 Sep 19 '16 at 04:11
  • @guest271314 could you provide code sample of exactly what you mean? i've attempted to manipulate `current.word` in many different ways already so i think i've tried. maybe not correctly. should i `return current.word;`? Thank you so much for your help. – RJK Sep 19 '16 at 04:15
  • Yes. If `findPrefixes` depends on `current.word` which is set in `addPrefix`, you should be able to call `findPrefixes` with `return current.word` added at conclusion of `addPrefix`; e.g, `findPrefixes(addPrefix())`. Or `var p = addPrefix();findPrefixes(p)` – guest271314 Sep 19 '16 at 04:18
  • I am using `findPrefixes(word)` and also `findPrefixes(suffix)` later in code. Is it okay if I replace both with `findPrefixes(addPrefix())`? Also, is it okay to just `return current.word;` after `current.word = true` in `addPrefix()`? – RJK Sep 19 '16 at 04:23
  • _"is it okay to just `return current.word; after current.word = true` in `addPrefix()`?"_ It should be. _"Is it okay if I replace both with `findPrefixes(addPrefix())`"_ Not sure? Try and review result? – guest271314 Sep 19 '16 at 04:25
  • @guest271314 Gives me a `Cannot read property '0' of undefined` at `while (char = word[i++]) {` in `addPrefix()` – RJK Sep 19 '16 at 04:26
  • If `addPrefix` is called first, why would there be an error? `word` is an object at `findPrefixes`? Should you be returning `word` from `addPrefix`?, where `current.word` is a boolean, not a plain object. – guest271314 Sep 19 '16 at 04:27
  • `addPrefix` isn't called first. `findPrefixes` is. – RJK Sep 19 '16 at 04:30
  • @guest271314 `word` is an object key of object `current`. yes, I want to return `current.word` from `addPrefix` and use it in `findPrefixes`. Yes, `word` is a boolean. So like this: ex: '{"w":{"o":{"r":{"d":{"word":true}}}}}' – RJK Sep 19 '16 at 04:32
  • _"`addPrefix` isn't called first. `findPrefixes` is."_ , _"I want to return `current.word` from `addPrefix` and use it in `findPrefixes`"_ ? – guest271314 Sep 19 '16 at 04:34
  • I am talking about in `list.forEach(function (word) {`, `findPrefixes` is checked first in `if` statement, and then `addPrefix` is called. – RJK Sep 19 '16 at 04:35
  • @guest271314 `if ((prefix = findPrefixes(addPrefix())) && prefix.length) {`, then `addPrefix(word);` – RJK Sep 19 '16 at 04:38
  • Is that pattern with `return current.word` at `addPrefix()`? Where `current.word` is a boolean? Then at `findPrefixes` `while (char = word[i++])` ? Should `current` be returned from `addPrefix()` ? If you need to access `current.word` at `findPrefixes` you can create a local variable in `findPrefixes` to reference `current.word`. Also if `word` is `{"w":{"o":{"r":{"d":{"word":true}}}}}` why do you loop with `var i = 0;`, `char = word[i++]`? – guest271314 Sep 19 '16 at 04:44
  • @guest271314 hello. I have added the updated code using your instruction in the original post. please take a look. it is still giving me issue. by the way, thank you so much for helping me :) yes, `current.word` should be returned from `addPrefix()` so i can use it in `findPrefixes`. how can i create local variable in `findPrefixes` to reference `current.word`? `var currword = current.word`? why is that different from what using `current.word`? is my loop incorrect? – RJK Sep 19 '16 at 04:49
  • if my loop is incorrect, how does it work when i pass just array of strings? – RJK Sep 19 '16 at 04:56
  • Still not fully gathering what you are trying to achieve, or expected result of each portion of each function. Does your original jsfiddle not return expected result? – guest271314 Sep 19 '16 at 04:56
  • @guest271314 it does when i pass array of strings. but it does not when i try to pass external text file. how do you suggest i modify my original jsfiddle so it can read text file and return correct results from `findLongestWord` function? the way i use `FileReader` above reads the file correctly and passes data correctly, but `current.word` does not behave the same way. to keep things as simple as possible, could you help me change my jsfiddle so it reads external text file and return correct results? – RJK Sep 19 '16 at 04:59
  • _"it does when i pass array of strings"_ Pass an array of strings comprising text file to initial function which handles array of strings. For example, read n chunks of text file, as an array, at a time, see http://stackoverflow.com/a/39554772/ – guest271314 Sep 19 '16 at 05:01
  • @guest271314 do you mean reading file, saving file content as a var array? then passing var array to function? i have tried that but `current.word` does not behave the same and incorrectly. – RJK Sep 19 '16 at 05:03
  • _"i have tried that but current.word does not behave the same and incorrectly"_ Not certain how it would be different. Note, `data.split(" ")` could return empty string as elements of array. You can chain `.filter(Boolean)` to `.split` call to remove empty strings. – guest271314 Sep 19 '16 at 05:05
  • @guest271314 i have tried debugging and walking through code so i know for sure it is not returning empty strings. the file data is being passed correctly, as i do see strings being passed. i have no idea why it is different either. this is my solution when i have implemented readFile: https://jsfiddle.net/dp0y4hrw/17/ If you want, try comparing these two solutions (https://jsfiddle.net/dp0y4hrw/16/) to see how I may be incorrectly implementing? the second one works with array of strings. the first one does not work with readfile. – RJK Sep 19 '16 at 05:09
  • https://jsfiddle.net/dp0y4hrw/18/ – guest271314 Sep 19 '16 at 05:13
  • https://jsfiddle.net/dp0y4hrw/19/ appears to return expected result? – guest271314 Sep 19 '16 at 05:15
  • @guest271314 version 19 seems to not work if text file text is delimited by \n. my text file has over 100,000 lines of text. but only one word each line. how could we implement this. thank you so much for your help by the way. means a lot – RJK Sep 19 '16 at 05:23
  • The number of lines in the file should not matter. But if browser freezes, can read file in chunks, incrementally. Can you provide an example of file, with 25 to 50 lines, and describe expected result? – guest271314 Sep 19 '16 at 05:25
  • @guest271314 hi guest. I have updated original post with the exact example of the text file i am using to test, and the expected result. it is the text file equivalent of the array i am passing in my initial JSFiddle. – RJK Sep 19 '16 at 05:31
  • `ratcatdogcat,catratdograt catsdogcats,dogcatscats 5` is returned at https://jsfiddle.net/dp0y4hrw/20/ using `var list = data.match(/\w+/g);` – guest271314 Sep 19 '16 at 05:39
  • 1
    @guest271314 how man. I'm at a loss for words. i can't thank you enough. how can i give you more than just +1 reputation? thank you for sticking with me for over an hour. it works. not sure what the regex is doing here tho lol – RJK Sep 19 '16 at 05:41

1 Answers1

2

Use RegExp /\w+/g

\w Matches any alphanumeric character from the basic Latin alphabet, including the underscore.

x+ Matches the preceding item x 1 or more times

var list = data.match(/\w+/g);
guest271314
  • 1
  • 15
  • 104
  • 177