Regex anything(including new lines) up to certain sequence - multiple substrings JS

Question

The File I am trying to process looks like this:

...
...
15 Apr 2014 22:05 - id: content
15 Apr 2014 22:09 - id: content
15 Apr 2014 22:09 - id: content
with new line
16 Apr 2014 06:56 - id: content
with new line
with new line
16 Apr 2014 06:57 - id: content

16 Apr 2014 06:58 - id: content
...
...

the regex I have come up with is this: \d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2}).*

which results in:

enter image description here

This is almost right i just need to include newline characters, but if i include this [\s\S]* instead of .* only one match is returned.

enter image description here

What i would like to extract is a set of substrings where each string starts at the data sequence and ends at the next date sequence like so:

...
...
15 Apr 2014 22:05 - id: content //substring 1
15 Apr 2014 22:09 - id: content //substring 2
15 Apr 2014 22:09 - id: content //substring 3
with new line                   //substring 3
16 Apr 2014 06:56 - id: content //substring 4
with new line                   //substring 4
with new line                   //substring 4
16 Apr 2014 06:57 - id: content //substring 5

16 Apr 2014 06:58 - id: content //substring 6
...
...

Any help to what im missing?

If you're trying to get groups of dates and content, why use such a complicated regex, just splitting on two newlines to get groups, then on single newlines to get each line seems a lot easier ? — adeneo, Feb 22 '15 at 15:47
YES, but there could still be content on the next line that belongs to the previous one — Ivan Bacher, Feb 22 '15 at 15:51

Avinash Raj · Accepted Answer · 2015-02-22T16:02:34.303

You need to use a positive lookahead assertion.

\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*?(?:(?!\n\n)[\s\S])*?(?=\n\d{1,}[ ])|\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2}).*

DEMO

> var str = '...\n...\n15 Apr 2014 22:05 - id: content\n15 Apr 2014 22:09 - id: content\n15 Apr 2014 22:09 - id: content\nwith new line\n16 Apr 2014 06:56 - id: content\nwith new line\nwith new line\n16 Apr 2014 06:57 - id: content\n\n16 Apr 2014 06:58 - id: content\n...\n...';
undefined
> var re = /\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*?(?:(?!\n\n)[\s\S])*?(?=\n\d{1,}[ ])|\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2}).*/gm;
undefined
> str.match(re)
[ '15 Apr 2014 22:05 - id: content',
  '15 Apr 2014 22:09 - id: content',
  '15 Apr 2014 22:09 - id: content\nwith new line',
  '16 Apr 2014 06:56 - id: content\nwith new line\nwith new line',
  '16 Apr 2014 06:57 - id: content\n',
  '16 Apr 2014 06:58 - id: content' ]

Thx, just a slight change: `(\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*?(?:(?!)[\s\S])*?(?=\d{1,}[ ])|\d{1,}[ ][A-Z][a-z]{2}[ ](?:\d{4}[ ]\d{2}[:]\d{2}|\d{2}[:]\d{2})[\s\S]*)` Demo: https://regex101.com/r/cS7sB7/1 — Ivan Bacher, Feb 22 '15 at 16:05

score -1 · Answer 2 · edited May 23 '17 at 11:43

-1

See the second answer here: How to use JavaScript regex over multiple lines?

Try using the non-greedy quantifier [\s\S]? like that and see what it returns. Alternatively, just get back one output and split the whole string on newlines afterwards...

edited May 23 '17 at 11:43

Community

1
1

answered Feb 22 '15 at 15:46

Boris Sitsker

19
2

its better to post the answer and add the link as reference. link content may be removed. – Razib Feb 22 '15 at 15:59

Regex anything(including new lines) up to certain sequence - multiple substrings JS

2 Answers2