0

I'm using the regex in the accepted answer here (Split a string by commas but ignore commas within double-quotes using Javascript) to split my CSV file, which works great except that the results of

.split(/(".*?"|[^",\s]+)(?=\s*,|\s*$)/g)

are returning the comma delimiters. I'm still new to writing my own RegEx and cannot seem to get the commas out of the result. I've tried numerous ways of creating a non-capturing group, but with no luck, for example:

.split(/((?:(".*?")|(?:[^",\s])+))(?=\s*,|\s*$)/)

For what it's worth, it is creating problems when I go to make a key:value pair object out of the data because I end up with numerous pairs like ",:,".

Here's a sample of one of the CSV records (there are many more fields, but this captures the gist and the challenge with both commas and double quotes in some of the descriptive fields):

-1821151,03/18/2021,23,018978783,"VENDOR 1",XXX11118465,999993348157,"OBJECT,OBJ TYPE,20"BLACK",,1546.0,EA
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    You would probably be better off finding a library for parsing CSV files instead of doing it ad hoc with a regexp. – Barmar Mar 23 '21 at 16:18
  • Please post the expected result – Ihor Yanovchyk Mar 23 '21 at 16:19
  • The application (NetSuite - SuiteScript) does not easily allow for adding libraries. It can be done sometimes, but it is not always straightforward as some libraries might work and some might not. That is why I am using the vanilla .split with regex. – flimflam Mar 23 '21 at 16:41
  • The expected result is everything within the sample record without the commas (sorry, I couldn't get the line breaks to work in the comments, but a space separates each value that should be output by .split): -1821151 03/18/2021 23 018978783 "VENDOR 1" XXX11118465 999993348157 "OBJECT,OBJ TYPE,20"BLACK" (empty) 1546.0 EA – flimflam Mar 23 '21 at 16:42

1 Answers1

0

I would parse it and not not reply on a reg exp. Basic idea is to split the string into an array of characters and loop over it and figure out where to split it.

const str = '-1821151,03/18/2021,23,018978783,"VENDOR \\"Foo\\" 1",XXX11118465,999993348157,"OBJECT,OBJ TYPE,20"BLACK",,1546.0,EA'
const result = str.split('').reduce((o, c, i, chars) => {
  if (c === ',' && !o.isOpen) {
    o.arr.push(o.cur);
    o.cur = '';
  } else if (c === '"' && !o.isSkip) {
    o.isOpen = !o.isOpen;
  } else if (c === '\\' && !o.isSkip) {
    const next = chars[i + 1];
    o.isSkip = next === '"' || next === "\\";
  } else {
    o.cur += c;
    o.isSkip = false;
  }

  if (chars.length === i + 1) {
    o.arr.push(o.cur);
  }
  return o;
}, {
  cur: '',
  arr: [],
  isOpen: false,
  isSkip: false
}).arr;

console.log(result);
epascarello
  • 204,599
  • 20
  • 195
  • 236
  • Thanks, that seems pretty elegant, but I will admit, I don't understand all of it as I have not had the need to use reduce much. I get that reduce is really just letting the code iterate through the array of characters and you are applying logic to the current and next character to determine whether it gets appended to the output array, I'm just getting a little hung up on the syntax. Also, am I correct you are assuming there has been some replace logic done prior to this to escape out what you created as \\Foo"\\? – flimflam Mar 23 '21 at 19:13
  • It is assuming you have quotes escaped, if no quotes will be in the keys then that code can just be removed. – epascarello Mar 23 '21 at 20:49