1

I try to parse a CSV in Google Apps Script (Google Sheets), with the code below:

var file = DriveApp.getFileById('xx');
var blobasstr = file.getBlob().getDataAsString();
var csvData = Utilities.parseCsv(blobasstr);

All works fine except from the fact that there are invisible carriage returns in my CSV within cells, which result in extra rows being created in the middle of cells. These carriage returns are a result from the fact that the source data contains cells with two address lines (and the carriage return in the middle). When I open the CSV in notepad I have to type my cursor twice to get past this point which strengthens my idea that there is a hidden carriage return there.

Can I use a regex on my blobasstr that removes these invisible carriage returns, but of course keeps the usual carriage returns at the end of each line?


Edit: Thanks for referring to the earlier question. With the regex:

blobasstr.replace(/(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r\n(?:\\[\s\S][^'\\]\r\n)*')/g, '\\r\\n');

I am indeed able to remove the 'hidden' returns which is a great start. The thing is that I now end up with the text \r\n in the cells that had a invisible return (instead of the original content of the cell, this is probably caused by the replacement argument '\r\n' Is there an alternative for this that keeps the original content of the cell (of course without the hidden carriage return?) Thanks a lot!


The help is really appreciated!

Chris

Chris Rutte
  • 173
  • 1
  • 14
  • 3
    You should supply demo data so the regex can be fittet to it - the usual problem of "bad data in = bad results out" apply. – Patrick Artner Jan 04 '18 at 12:27
  • 2
    Possible duplicate of [Apps Script Utilities.parseCsv assumes new row on line breaks within double quotes](https://stackoverflow.com/questions/36658793/apps-script-utilities-parsecsv-assumes-new-row-on-line-breaks-within-double-quot) – Kos Jan 04 '18 at 13:03
  • Thanks for your comments, big step forward, I made an edit to my post to make it more useful. If you have any ideas, please let me know. – Chris Rutte Jan 05 '18 at 10:38

2 Answers2

0

notepad expects dos line endings \r\n and doesn't handle unix line endings single \n, i could reproduce behavior with a simple file with \n, the cursor must be typed twice to advance.

Seems Utilities.parseCsv can't handle multiline cells, because can't specify an optional enclosing character like ".

Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36
0

This is what did it for me eventually:

var blobasstr = blobasstr2.replace(/(?=["'])(?:"[^"\]*(?:\[\s\S][^"\]*)*"|'[^'\]\r\n(?:\[\s\S][^'\]\r\n)*')/g, function(match) { return match.replace(/\n/g,"")} );
Chris Rutte
  • 173
  • 1
  • 14