2

I have a text file with following content:

Test, [636,13,"be738jsk","some, text",js]

I want to read this content into an Array. I currently use JavaScript with regex to split into substrings directly into an array. As regex i have: line.split(/,\s\[|",|,"|","/);

The problem is, i have some sentences like in the example with a "," in it and i don't want to split there. So i tried to say in the regex, "split after , and everthing except whitespace". The problem is, it also cuts out the "everything" after the ,

Example:

Test, [63737,33,"bla,blablba",737]

When i use this regex:

Line.split(/,"|,\s\[|",|,[^\s]/); Then it cut's out the 3 from 33 :(

Henning
  • 39
  • 8
  • 2
    Try just `string.match(/\w+|"[^"]*"/g)` or `Array.from('Test, [636,13,"be738jsk","some, text",js]'.matchAll(/\w+|"([^"]*)"/g), x => x[1] ?? x[0])` – Wiktor Stribiżew Sep 13 '21 at 12:00
  • Your second solution works perfectly. Thx. But now I detected, that there also some strings like "[Blablba] Blablba". With your solution sadly this also get splitted :/ So have a solution for this – Henning Sep 13 '21 at 12:22
  • `Array.from("[Blablba] Blablba".matchAll(/\w+|"([^"]*)"/g), x => x[1] ?? x[0])` shows `['Blablba', 'Blablba']`, is it not expected? – Wiktor Stribiżew Sep 13 '21 at 12:23
  • No because the [Blabla] Blabla is between two quotation marks and should stay together. – Henning Sep 13 '21 at 12:29
  • Then my solution still works, see the answer below with a code demo. – Wiktor Stribiżew Sep 13 '21 at 12:30
  • Oh yes thx. Last question: When i have in the string a substring, like: Test, [1726,12,"[Header] nice one "Test" bla",15]. Currently it splits the hole string because of the quotation marks in it. Is there also a solution for this? Combined with the solution from before? – Henning Sep 13 '21 at 12:41
  • What do you expect to get as a result? Do you mean there is a missing `"`? – Wiktor Stribiżew Sep 13 '21 at 12:46
  • When i have: Test, [2637,12,"[Blablba] nice "Test" one",637] I want to have an array with my String in one field: [ [Blablba] nice "Test" one] without splitting. With your latest solution sadly this gets splitted Into: [ [Blablba] nice, test, one]. You know what i mean? :) – Henning Sep 13 '21 at 13:34
  • I only know what you write, and what you have supplied is `Test, [2637,12,"[Blablba] nice "Test" one",637]`. And this means you have double quotes inside double quotes. And there is no solution for this. It looks like some buggy output from some data provider, and it should be fixed on their side. Or, do you mean the double quotes inside double quotes are escaped? Please use backticks when adding examples or code inside comments. – Wiktor Stribiżew Sep 13 '21 at 13:39
  • Hey can you maybe tell me how to do the same in normal Java? :) – Henning Sep 17 '21 at 07:20
  • [Here is the sample code](https://stackoverflow.com/a/41968628/3832970), where `\S` is used instead of `\w`, so just replace `\S` with `\w`, and it will work. – Wiktor Stribiżew Sep 17 '21 at 08:14
  • It works, thank you. But currently if I have a number in my string like ```Test, ["Blabla",3.0, -1.0``` sadly it's giving out: ```..."3", "0", "1", "0"``` Do you have any solution for this to put in my Regex? :/ – Henning Sep 17 '21 at 10:18
  • Yes, `/[-+]?\d+(?:\.\d+)?|\w+|"([^"]*)"/g` or `/[-+]?\d*\.?\d+|\w+|"([^"]*)"/g` – Wiktor Stribiżew Sep 17 '21 at 10:19
  • Thank you so so much – Henning Sep 17 '21 at 10:23

1 Answers1

1

You can use

const text = 'Test, [636,13,"be738jsk","some, text",js], "[Blablba] Blablba"';
console.log(
  Array.from(text.matchAll(/\w+|"([^"]*)"/g), x => x[1] ?? x[0])
)

Output:

[
  "Test",
  "636",
  "13",
  "be738jsk",
  "some, text",
  "js",
  "[Blablba] Blablba"
]

Here, \w+|"([^"]*)" matches one or more word chars or any zero or more chars other than double quotation marks (captured into Group 1) in between double quotation marks. The x => x[1] ?? x[0] part takes Group 1 values if defined, else, it keeps the whole match value.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563