0

I have Excel data converted to CSV. The original cells which contained commas are double-quoted in the CSV. For example, one row might look like:

Multiple Choice,"For a student that needing help, a teacher could:",Make it easier,Have them read independently,Use video,Summarize each paragraph,C

Without such quoted text containing commas I would just use JavaScript to break the string into array elements.

var theArray = theString.split(',');

But the fact that some of the pieces could be quoted and contain commas makes that difficult.

Is there a simple JavaScript way to break up this kind of text into Array elements, so in this case, the elements would be:

  • Multiple Choice
  • For a student that needing help, a teacher could:
  • Make it easier
  • Have them read independently
  • Use video
  • Summarize each paragraph
  • C
Doug Lerner
  • 1,383
  • 2
  • 17
  • 36
  • will it contain only double quotes ? – Artyom Amiryan Nov 13 '18 at 12:22
  • @Artyom CSV is actually a standard, so `"hello"`, would be `hello`, were as `"hello"""` would be `hello"` etc. It would be trivial to make a CSV parser in pure Javascript, but sometimes it's a good idea to use a lib, in case of edge cases & performance. A quick google got me this -> https://www.papaparse.com/#unparse – Keith Nov 13 '18 at 12:25
  • 1
    Have you looked at the solutions in https://stackoverflow.com/questions/8493195/how-can-i-parse-a-csv-string-with-javascript-which-contains-comma-in-data – Tom Panning Nov 13 '18 at 12:29
  • @Keith ok thanks, will have a look – Artyom Amiryan Nov 13 '18 at 12:30
  • @TomPanning In that link, there is a warning, .. `this solution does NOT fit the RFC 4180 definition of CSV and it also does NOT fit MS Excel format.` – Keith Nov 13 '18 at 12:32
  • @Keith Correct, but it will fit the example that the OP provided (commas in quotes). If that's the only thing the OP needs, then the solution should work. – Tom Panning Nov 13 '18 at 12:34
  • @TomPanning Yes, it might be fine, it's with him saying -> `I have Excel data converted to CSV`, for me it would just feel nicer decoding correctly in the first place. – Keith Nov 13 '18 at 12:41
  • Ideally, as mentioned above, `"hello"` would be `hello` and `"hello"""` would be `hello"` since the quoted parts might contain quotes themselves. – Doug Lerner Nov 13 '18 at 12:46

2 Answers2

1

You'll probably have to use some regex to setup a more detailed selector.

Try the following:

theString.split(/(?:,)(?=(?:[^"]|"[^"]*")*$)/)

Which would result in

['Multiple Choice', '"For a student that needing help, a teacher could:"',
 'Make it easier', 'Have them read independently', 'Use video',
 'Summarize each paragraph', 'C']
Raven
  • 1,453
  • 3
  • 18
  • 29
  • The second array element is wrong for a CSV, it should not contain the `"` – Keith Nov 13 '18 at 12:28
  • @Keith That shouldn't matter for Javascript. If it becomes an issue you can easily add a second operation to strip these quotemarks. – Raven Nov 13 '18 at 12:30
  • But what if the quotes were there? – Keith Nov 13 '18 at 12:30
  • Then you can add `.map(s=>s.replace(/^\"|\"$/g,""));` to remove start and trailing `"`'s. Those quotation marks are just artifacts of parsing the CSV and don't affect javascript strings – Raven Nov 13 '18 at 12:34
  • `"my ""name"` This should give `my "name`, but doing what you just said, it would be `my ""name`, and is wrong. My point is CSV's, are not a simple regex solution, it's a standard, and is not a trivial as one might first expect. – Keith Nov 13 '18 at 12:37
  • I don't follow your reasoning. And I'm not sure if this is relevant to the original question. What logic dictates you should remove three quotation marks? And is this not dependent on the implementation of CSV? Even so - that would not be relevant in javascript strings and you can adjust the solution depending on the implementation of CSV you are using – Raven Nov 13 '18 at 12:40
  • `What logic dictates you should remove three quotation marks?` The standard, and Excel export uses the standard. https://tools.ietf.org/html/rfc4180 – Keith Nov 13 '18 at 12:44
  • I will try the solution mentioned and report back. While I did not mention it in my OP (I figured I could sort that out later) it would be ideal if the outer double-quotes were stripped, but inner exported quotes were turned to single quotes, e.g. `"He said, ""hello"""` would end up as the element `He said, "hello"` – Doug Lerner Nov 13 '18 at 12:49
  • This solution seems nearly perfect. The only issue is that the quotes remain in those questions which are quoted. – Doug Lerner Nov 13 '18 at 13:23
  • 1
    I think, as Raven mentioned above, I could strip out the outer quotes from the results and replace instances of escaped double-quotes `""` with just `"` and call it a day. Thank you! – Doug Lerner Nov 13 '18 at 13:34
0

I believe this regex answers the needed result: /(?:([^,\"]+)|(?:\"([^\"]+)\"))/ You can use that in Raven's answer.

Yoav Moran
  • 107
  • 9
  • For that solution, that does seem to cause extra elements when there are internal quotes. For example, with `Multiple Choice,"When scoring the TAGPEC, what number score does an ""Emerging"" item receive?",0,1,2,3,4,B` the word `Emerging` gets split into a separate element. – Doug Lerner Nov 13 '18 at 13:07