8

When using Utilities.parseCsv() linebreaks encased inside double quotes are assumed to be new rows entirely. The output array from this function will have several incorrect rows.

How can I fix this, or work around it?

Edit: Specifically, can I escape line breaks that exist only within double quotes? ie.

/r/n "I have some stuff to do:/r/n Go home/r/n Take a Nap"/r/n

Would be escaped to:

/r/n "I have some stuff to do://r//n Go home//r//n Take a Nap"/r/n

Edit2: Bug report from 2012: https://code.google.com/p/google-apps-script-issues/issues/detail?id=1871

Alan Wells
  • 30,746
  • 15
  • 104
  • 152
Douglas Gaskell
  • 9,017
  • 9
  • 71
  • 128
  • The `parseCsv()` method takes a string. You can use string methods to remove what is causing the problem. Maybe a global `replace()` would work? `var newCsv = myCsv.replace(/\n/g, "") //replace new line with empty space` – Alan Wells Apr 16 '16 at 01:45
  • I have edited my question to clarify. Sadly doing that also removes any indication of new rows, which are delimited by a `/r/n`. Most spreadsheet applications will notice that a linebreak exists within double quotes and will ignore it. The `parseCsv()` function does not seem to take double quotes into effect when parsing line breaks. – Douglas Gaskell Apr 16 '16 at 01:54

5 Answers5

4

So I had a somewhat large csv file about 10MB 50k rows, which contained a field at the end of each row with comments that users enter with all sorts of characters inside. I found the proposed regex solution was working when I tested a small set of the rows, but when I threw the big file to it, there was an error again and after trying a few things with the regex I even got to crash the whole runtime.

BTW I'm running my code on the V8 runtime.

After scratching my head for about an hour and with not really helpful error messages from AppsSript runtime. I had an idea, what if some weird users where deciding to use back-slashes in some weird ways making some escapes go wrong. So I tried replacing all back-slashes in my data with something else for a while until I had the array that parseCsv() returns. It worked! My hypothesis is that having a \ at the end of lines was breaking the replacement.

So my final solution is:

function testParse() {
    let csv =
        '"title1","title2","title3"\r\n' +
        '1,"person1","A ""comment"" with a \\ and \\\r\n a second line"\r\n' +
        '2,"person2","Another comment"';

    let sanitizedString =
        csv.replace(/\\/g, '::back-slash::')
            .replace(/(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r?\n(?:\\[\s\S][^'\\]\r?\n)*')/g,
                match => match.replace(/\r?\n/g, "::newline::"));
    let arr = Utilities.parseCsv(sanitizedString);
    for (let i = 0, rows = arr.length; i < rows; i++) {
        for (let j = 0, cols = arr[i].length; j < cols; j++) {
            arr[i][j] = 
                arr[i][j].replace(/::back-slash::/g,'\\')
                    .replace(/::newline::/g,'\r\n');

        }
    }
    Logger.log(arr)
}

Output:

[20-02-18 11:29:03:980 CST] [[title1, title2, title3], [1, person1, A "comment" with a \ and \
 a second line], [2, person2, Another comment]]
J. García
  • 1,859
  • 1
  • 12
  • 13
2

It may be helpful for you to use Sheets API. In my case, it works fine without replacing the CSV text that contains double-quoted multi-line text.

First, you need to make sure of bellow:

Enabling advanced services

To use an advanced Google service, follow these instructions:

  1. In the script editor, select Resources > Advanced Google services....
  2. In the Advanced Google Service dialog that appears, click the on/off switch next to the service you want to use.
  3. Click OK in the dialog.

If it is ok, you can import a CSV text data into a sheet with:

var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheetByName('some_name');
const resource = {
    requests: [
        {
            pasteData: {
                data: csvText, // Your CSV data string
                coordinate: {sheetId: sheet.getSheetId()},
                delimiter: ",",
            }
        }

    ]
};
Sheets.Spreadsheets.batchUpdate(resource, ss.getId());

or for TypeScript, which can be used by clasp:

var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheetByName('some_name');
const resource: GoogleAppsScript.Sheets.Schema.BatchUpdateSpreadsheetRequest = {
    requests: [
        {
            pasteData: {
                data: csvText, // Your CSV data string
                coordinate: {sheetId: sheet.getSheetId()},
                delimiter: ",",
            }
        }

    ]
};
Sheets.Spreadsheets.batchUpdate(resource, ss.getId());
Nao
  • 21
  • 6
1

I had this same problem and have finally figured it out. Thanks Douglas for the Regex/code (a bit over my head I must say) it matches up nicely to the field in question. Unfortunately, that is only half the battle. The replace shown will simply replaces the entire field with \r\n. So that only works when whatever is between the "" in the CSV file is only \r\n. If it is embedded in the field with other data it silently destroys that data. To solve the other half of the problem, you need to use a function as your replace. The replace takes the matching field as a parameter so so you can execute a simple replace call in the function to address just that field. Example...

Data:

"Student","Officer

RD

Special Member","Member",705,"2016-07-25 22:40:04 EDT"

Code to process:

var dataString = myBlob().getDataAsString(); 
var escapedString = dataString.replace(/(?=["'])(?:"[^"\](?:\[\s\S][^"\])"|'[^'\]\r\n(?:\[\s\S][^'\]\r\n)')/g, function(match) { return match.replace(/\r\n/g,"\r\n")} ); 
var csvData = Utilities.parseCsv(escapedString);

Now the "Officer\r\nRD\r\nSpecial Member" field gets evaluated individually so the match.replace call in the replace function can be very straight forward and simple.

Ronak Patel
  • 3,819
  • 1
  • 16
  • 29
eafritz
  • 11
  • 3
  • The replace repalces the entire field if it's already a double quoted field. In this case, it was intended to replace unescaped line breaks with escaped line breaks. I use it daily in some automated parsing scripts. It may have unintended consequences if trying to use on an already escaped field (I think, it's been a while). TBH, I don't have a deep understanding of regex. I did make a robust CSV parser in C# though, that is regex-free, I might look into porting it to JavaScript. – Douglas Gaskell Jul 28 '16 at 20:11
1

To avoid trying to understand regular expressions, I found a workaround below, not using Utilities.parseCsv(). I'm copying the data line by line.

Here is how it goes:

If you can find a way to add an extra column to the end of your CSV, that contains the exact same value all the time, then you can force a specific "line break separator" according to that value.

Then, you copy the whole line into column A and use google app script' dedicated splitTextToColumns() method...

In the example below, I'm getting the CSV from an HTML form. This works because I also have admin access to the database the user takes the CSV from, so I could force that last column on all CSV files...

function updateSheet(form) {
  var fileData = form.myFile;
  // gets value from form
  blob = fileData.getBlob();
  var name = String(form.folderId);
  // gets value from form

  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.setActiveSheet(ss.getSheetByName(name), true);

  sheet.clearContents().clearFormats();

  var values = [];
  // below, the "Dronix" value is the value that I could force at the end of each row
  var rows = blob.contents.split('"Dronix",\n');

  if (rows.length > 1) {
    for (var r = 2, max_r = rows.length; r < max_r; ++r) {
      sheet.getRange(r + 6, 1, 1, 1).setValue(String(rows[r]));
    }
  }

  var spreadsheet = SpreadsheetApp.getActive();
  spreadsheet.getRange("A:A").activate();
  spreadsheet.getRange("A:A").splitTextToColumns();
}
grooveplex
  • 2,492
  • 4
  • 28
  • 30
0

Retrieved and slightly modified a regex from another reply on another post: https://stackoverflow.com/a/29452781/3547347

Regex: (?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r\n(?:\\[\s\S][^'\\]\r\n)*')

Code:

  var dataString = myBlob.getDataAsString();
  var escapedString = dataString.replace(/(?=["'])(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]\r\n(?:\\[\s\S][^'\\]\r\n)*')/g, '\\r\\n');
Community
  • 1
  • 1
Douglas Gaskell
  • 9,017
  • 9
  • 71
  • 128