0

I have a problem with split and I don't know how to solve it.

I want to split a CSV file, avoiding some ' " ' chars, and it's working fine, but some of the lines have text like the next example:

Robert, Pattinson, rober@company.com, "London street, 19", London

It splits like this:

Robert
Pattinson
rober@company.com
london street
19
London

And I want to split it like this:

Robert 
Pattinson 
rober@company.com 
London street, 19 
London

Here is the command I use for this:

let content = (evt.target as FileReader).result.toString().replace(/["]/g,'');

How can I fix this?

Thanks in advance.

EDIT:

I just noticed I forgot to include the entire code to split, here it is:

let content = (evt.target as FileReader).result.toString().replace(/["]/g,''); 
let lines = content.split('\n'); 
let commaSeparated = lines.map(function(line) { 
return line.split(',');
});
  • Possible duplicate of [How can I parse a CSV string with Javascript, which contains comma in data?](https://stackoverflow.com/questions/8493195/how-can-i-parse-a-csv-string-with-javascript-which-contains-comma-in-data) – maazadeeb Mar 06 '19 at 16:07
  • making a comment as this isn't an answer to the actual but there are a lot of CSV parse libraries, maybe use one of those or look at source to see how that do it. – Wilhelmina Lohan Mar 06 '19 at 16:18
  • Instead of actually splitting you could find all matches, then loop and check if it’s not in quotes before changing it https://stackoverflow.com/a/12721944/10634638 – estinamir Mar 06 '19 at 16:21

2 Answers2

0

I'm assuming you don't have any escaped quotes in your quoted data, like "Tell her I said, \"Hello,\" please." Then you can split on quotes first, get rid of leading and trailing commas, and then split on commas and flatten:


> 'one, "two, three,", four, "five, six", seven, eight, nine'
    .split('"')
    .map((x,i)=>i&1
      ?[`"${x}"`]
      :x.replace(/\s*,\s*$/,'')
        .replace(/^\s*,\s*/,'')
        .split(','))
    .map(a=>a.map(x=>x.trim()))
    .flat()

<· ["one", '"two, three,"', "four", '"five, six"', "seven", "eight", "nine"]
Mike Stay
  • 1,071
  • 8
  • 17
0

If you are expecting the string to follow always the same format, 5 elements separated by commas that may or not include double quotes, then why not use regular expressions to extract the information?

If your string goes as: Robert, Pattinson, rober@company.com, "London street, 19", London

then:

var str = 'Robert, Pattinson, rober@company.com, "London street, 19", London';
var arr = str.split(/^"[^"\\]*(?:\\.[^"\\]*)*"|[^,]?$/);
console.log(arr);

Note that my regular expression is not working 100% correctly. It is detecting an extra element at the end; however, if you ignore it, then you get all the elements into an array that its easy to access.

Of course, you need to do your version in Java but it shouldn't be that hard since its regular expressions.

acarlstein
  • 1,799
  • 2
  • 13
  • 21