2

I have a string like this:

&topic1
Lorem ipsum dolor sit amet, consectetur adipiscing elit, 
sed do 
eiusmod tempor incididunt ut 

&topic2
labore et dolore magna aliqua. Ut enim ad minim 
www.example.com/test?id=1&name=abc
veniam, quis nostrud exercitation ullamco lab

&topic3
hello

Each time there is (beginning of line) + & + name + \n, it should be parsed into a new item.

How is the most natural way to parse it this way with Javascript:

[['topic1', 'Lorem ipsum dolor sit amet, consectetur adipiscing elit,\nsed do\neiusmod tempor incididunt ut'],
 ['topic2', 'labore et dolore magna aliqua. Ut enim ad minim\nwww.example.com/test?id=1&name=abc\nveniam, quis nostrud exercitation ullamco lab'],
 ['topic3', 'hello']]

?

I have several problems with this method:

var s = "&topic1\nLorem ipsum dolor sit amet, consectetur adipiscing elit,\nsed do\neiusmod tempor incididunt ut\n\n&topic2\nlabore et dolore magna aliqua. Ut enim ad minim\nwww.example.com/test?id=1&name=abc\nveniam, quis nostrud exercitation ullamco lab\n\n&topic3\nhello";

s.split('&').forEach(function(elt) { console.log(elt.split('\n')[0], elt.split('\n').slice(1)); });
  • the first item is empty (this can be removed after, but maybe there's a cleaner way?)

  • if & is in the middle of a line (and not beginning) then this code doesn't work

  • I'd like the the text after the header title to be in one single string (e.g. Lorem ipsum dolor sit amet, consectetur adipiscing elit,\nsed do\neiusmod tempor incididunt ut), and not split for each \n

How to do a cleaner parsing?

Basj
  • 41,386
  • 99
  • 383
  • 673

1 Answers1

1

Split by linebreaks first, then join them up until you find a new topic:

var s = "&topic1\nLorem ipsum dolor sit amet, consectetur adipiscing elit,\nsed do\neiusmod tempor incididunt ut\n\n&topic2\nlabore et dolore magna aliqua. Ut enim ad minim\nwww.example.com/test?id=1&name=abc\nveniam, quis nostrud exercitation ullamco lab\n\n&topic3\nhello";

const result = [];
let acc = [];

for(const line of s.split("\n")) {
  if(line[0] === "&") {
    // New topic found
    result.push(acc);
    acc = [line.substr(1), ""];
  } else {
    acc[1] += (acc[1] && "\n") + line;
  }
}

result.push(acc);
result.shift();   // removes the first element from an array and returns that removed element
console.log(result);
Jonas Wilms
  • 132,000
  • 20
  • 149
  • 151
  • @basj i can't sorry (doesnt work on my phone), but feel free to edit :) – Jonas Wilms Jun 11 '18 at 15:34
  • 1
    Wow, you write code on a phone, congratulations :) (I find it so hard to format code on such a small screen / small keyboard!). I edited into a code snippet. Thank you very much! – Basj Jun 11 '18 at 15:37
  • @basj its all about habits, thanks for the edit and im glad to help :) – Jonas Wilms Jun 11 '18 at 15:57
  • A small corner case @JonasW.: if `var s = "&topic1\nLo\nrem\n&topic2\nipsum"`, then I would like `[["topic1", "Lo\nrem"], ["topic2", "ipsum"]]` and not `Lo\nrem\n` and not `ipsum\n`. Any idea? I should maybe remove the final `\n` but what would be the nicer way? – Basj Jun 11 '18 at 17:08
  • Looks great like this! What does `s && "\n"` do? I know it for boolean but not for strings – Basj Jun 11 '18 at 18:06
  • 1
    @basj if `acc[1]` is an empty string ("") it will take that empty string (so it wont add a newline at the beginning) otherwise if `acc[1]` is not empty anymore, it takes `\n` instead, so it appends a newline then. This behaviour is called short-circuiting and truthyness. – Jonas Wilms Jun 11 '18 at 18:16
  • Thanks @JonasW. for showing me && for strings! [Here is a good answer](https://stackoverflow.com/a/32158899/1422096) about it. – Basj Jun 12 '18 at 11:11