-1

I want to get all the content in a text file before the first empty line.

I've found a working regex, but when I try to accomplish the same in Javascript it doesn't work.

(loading the file's contents is working)

async function readDir() {
    return new Promise((resolve,reject) => {
        fs.readdir('./content', (err, files) => {
            if(err) { reject(err) }
            resolve(files)
        });
    });
}

readDir().then((files) => {
    files.forEach(file => {
        var filepath = path.resolve('./content/'+file)
        if(filepath.endsWith('.txt')) {
            if(fs.statSync(filepath)["size"] > 0) {
                let data = fs.readFileSync(filepath).toString();
                let reg = /^[\s\S]*?(?=\n{2,})/;
                console.log(data.match(reg)) //returns null
            }
        }
    });
})

EDIT:

As O. Jones pointed out, the problem lies with the line endings. My regex was not picking up on \r\n line endings present in my file.

For now, this one seems to do the job: /^[\s\S]*?(?=(\r\n\r\n?|\n\n))/m

Erik
  • 9
  • 3
  • What exactly are the contents of your file? Are you sure there are two linebreaks right after each other (i.e. no other whitespace, including `\r`, in the empty line)? – Bergi Oct 10 '20 at 17:37
  • The link to the regex includes the exact content of the txt-file – Erik Oct 10 '20 at 17:41
  • Then no, that works in javascript as well. And notice that copy-pasting a file into a textarea can change the line endings. – Bergi Oct 10 '20 at 17:45
  • I tried this and it worked for me. You are doing something wrong with the file content – tbking Oct 10 '20 at 17:51
  • I created the textfile via the Visual Studio Code file explorer, and copy-pasted the content from here: http://www.gutenberg.org/files/36/36-0.txt When I JSON.stringify the file content, I get: _"The War of the Worlds\r\n10-10-2020\r\n\r\nI.\r\nTHE EVE OF THE WAR.\r\n\r\nNo one would have believed in the last years of the nineteenth century\r\nthat this world was being watched keenly and closely by intelligences\r\ngreater than man’s and yet as mortal as his own; that as men busied\r\nthemselves..._ I'm sorry guys, I have no idea what it could be.. – Erik Oct 10 '20 at 17:59

2 Answers2

0

It looks like you want to match your re to the whole, multiline, contents of your file. You need the multiline flag to do that.

Try this

let reg = /^[\s\S]*?(?=\n{2,})/m;

Notice the m after the re's closing /. For more explanation see the section called Advanced Searching With Flags here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

Also, it's possible you have line-ending trouble. Linux/ FreeBSD/ UNIX systems use \n aka newline to mark the end of each line. Macs use \r aka return for that. And Windows uses \r\n, two characters at the end of each line. Yeah, we all know what a pain in the xxx neck this is.

So your blank line detector is probably too simple. Regular Expression to match cross platform newline characters Try using this to match cross-os ends of lines

\r\n?|\n

meaning either a return followed by an optional newline, or just a newline.

It might look something like this.

let reg = /^[\s\S]*?(?=(\r\n?|\n)(\r\n?|\n))/m;

That looks for two of those end of line patterns in a row (not tested by me, sorry).

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Doesn't make a difference in my case.. it's probably the file – Erik Oct 10 '20 at 17:56
  • Please see my edit. Maybe line endings are the problem. Definitely, I just saw your comment. – O. Jones Oct 10 '20 at 18:25
  • Yes, this is it, although your version of the regex returned only the first line. With some tinkering I found this to be working: `/^[\s\S]*?(?=(\r\n\r\n?|\n\n))/m` – Erik Oct 10 '20 at 19:18
0

You may want to try:

const EOL = require('os').EOL; // system newline.
const regex = new Regex('^.*?(?=' + EOL + EOL + ')', 's'); // everything before first two newlines.
Alexander Mashin
  • 3,892
  • 1
  • 9
  • 15