0

I'm converting a doc file into html using mammoth and then trying to read the html file line by line.

As explained in this question, i've tried nodejs inbuilt readline module, but i'm getting the whole page as a single line.

mammoth.convertToHtml({path: path.resolve("articles") + "a_doc_file.docx"}, options)
.then(async function(result) {
    var html = result.value;
    var messages = result.messages;

    const fileStream = fs.createReadStream(`${path.resolve("articles")}/article.html`);        
    fs.writeFileSync(`${path.resolve("articles")}/article.html`, html);
    const rl = readline.createInterface({
        input: fileStream,
        crlfDelay: Infinity
    });

    for await (const line of rl) {
        console.log(`Line from file: ${line}`);
    }
})
.done();

I need to get the html output line by line so that i can save the data accordingly in the schema


Thanks for any help in advance

Aman Kumar
  • 480
  • 2
  • 7
  • 21
  • Just a wild guess... Is it possible that specifying `crlfDelay` makes it so that the input stream *must* use CR/LF line endings? And if the line endings are just LF or just CR alone, those aren't being treated as line breaks at all, causing you to get the whole file at once? – kshetline Aug 26 '19 at 09:21
  • I guess not... I just tried this on a sample file myself. I'm successfully reading one line at a time no matter what type of line endings I use. I can't recreate the problem you're having. – kshetline Aug 26 '19 at 09:35
  • Perhaps the original files don't have any line breaks at all? I've definitely seen many HTML files served like that. – kshetline Aug 26 '19 at 10:00
  • @kshetline yep, that was the case. Anyway, thanks a lot for your time – Aman Kumar Aug 26 '19 at 15:26
  • @kshetline, do you know, how can we go to line 2 if currently we're in line 1? Thanks – Aman Kumar Aug 27 '19 at 11:39
  • 1
    I'm not sure I understand your question. Lines (if there are any line breaks at all!) are read sequentially, so after reading line 1, line 2 will come next automatically. That `rl` value is an asynchronous iterator - which is entirely something new to me, but I think I understand what it's doing. I'm glad I saw your post, because I'd never seen `for await` before either, which is good to know about. – kshetline Aug 27 '19 at 13:30
  • @kshetline, while looping if i'm at line 1, is there any option to read line 2 without loop continuation? Like consider index in for loop. let's say i (current index of for loop) is at 0, then we can access next element i+1, similarly is is possible in readline anyhow – Aman Kumar Aug 27 '19 at 20:41
  • btw i had also solved my requirement using for await and pushed all new lines in a new array – Aman Kumar Aug 27 '19 at 20:42

0 Answers0