23

I'm trying to produce a Google Shopping feed of 30,000+ items in NetSuite, a CRM system that runs server-side JavaScript that it calls Suitescript 2.0. Essentially, it's just JavaScript with a few more restrictions. I've been tasked with outputting this product feed as a CSV.

The problem is that the product descriptions of these items contain variables amounts of commas, double quotes, single quotes and HTML. At first, it was just the commas causing me problems, so after a bit of research, I wrapped the strings I was outputting in double quotes:

//This function isn't terribly important, but is referenced below

function sanitizeString (desc) {
    var itemDesc;
    if (desc) {
        itemDesc = desc.replace(/(\r\n|\n|\r|\s+|\t| )/gm,' ');
        itemDesc = itemDesc.replace(/,/g, '\,');
        itemDesc = itemDesc.replace(/"/g, '\"');
        itemDesc = itemDesc.replace(/'/g, '\'');
        itemDesc = itemDesc.replace(/ +(?= )/g,'');
    } else {
        itemDesc = '';
    }
    return itemDesc;
}

var row = '';

for (var i = 0; i < columns.length; i++) {
    var col = columns[i];
    row += '"' + sanitizeString(val[col]) + '"';
    if (i != columns.length - 1) {
        row += ',';
    }
}
newFeed.appendLine({value: row});

However, it seems that these double quotes are interacting strangely with double quotes within the string causing some weird formatting, even though my sanitizeString() function should be escaping them. Any time that a description contains a double quote, the next row doesn't get it's own line. It gets appended to the last column.

So, naturally, I escaped the external quotes like this:

row += '\"' + sanitizeString(val[col]) + '\"';

Doing that makes things go completely haywire, a lot of items don't get pushed to new lines and I max out the number of columns I'm allowed because it just keeps on going.

The other natural solution would be to go edit the product descriptions, but I'm not terribly anxious to do that for 30,000+ items...

Does anybody know what might be going on here? I feel like there's something really simple I'm overlooking...

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
B1gJ4k3
  • 763
  • 1
  • 5
  • 12
  • If your escaping function is supposed to add backslashes to the output you'll need to escape the backslashes in the string literals in the function, i.e., `'\\,'` rather than `'\,'`, or for the one that contains single quotes either `'\\\''` or `"\\'"`. – nnnnnn Oct 09 '17 at 02:39
  • No, it's not supposed to add backslashes. It was just supposed to escape them for the final string. Adding escaped backslashes makes things go haywire again and new lines aren't being added, but appended to the end of lines containing quotes... – B1gJ4k3 Oct 09 '17 at 03:09
  • What does "escape them for the final string" mean to you? Doesn't that mean the final string will have backslashes added to it? E.g., if the input for a particular *field* was `Hello, good bye` then the output should be `Hello\, good bye`, no? For CSV, given you're removing newlines and putting each field in double-quotes, I would think only double-quotes would need to be escaped - although for some reason you seem to be putting the whole *row* in double-quotes, which isn't normal for CSV. Please [edit] your question to show a sample two-line input and the corresponding desired output. – nnnnnn Oct 09 '17 at 03:21
  • I guess I meant that it escapes them so that it wouldn't cause the exact problem that it seems to be causing in the final output. I've edited the question to be more clear what I'm doing with the "row" variable. (It's somewhat poorly-named if you can't see the loop). I looked at the [CSV spec](https://en.wikipedia.org/wiki/Comma-separated_values#Example) and it looks like double quotes within double quotes need to be done as two double quotes. So, "Hello, ""goodbye"", is a string". But replacing the double quotes with two double quotes works, but my output now has two double quotes in it... – B1gJ4k3 Oct 09 '17 at 03:49
  • OK, never mind. I figured it out. It turns out I was running the sanitizeString() function twice. Again, according to the CSV spec, double quotes within a quoted column need to represented as two double quotes (""). I was running the function twice, which was producing 4 double quotes, translating to two. – B1gJ4k3 Oct 09 '17 at 04:17

3 Answers3

37

It turns out that, according to the CSV specs, to include double quotes within a string that is already quoted, you need to use two double quotes (""). I changed:

itemDesc = itemDesc.replace(/"/g, '\"');

to

itemDesc = itemDesc.replace(/"/g, '""');

I also removed

itemDesc = itemDesc.replace(/,/g, '\,');
itemDesc = itemDesc.replace(/'/g, '\'');

Since the column in the CSV is being quoted already. These are unnecessary.

B1gJ4k3
  • 763
  • 1
  • 5
  • 12
  • 3
    Edge case, if the string doesn't start with a quote, it can include excaped quotes such as `Yes "" I am`. If it starts with a quote, it must end with a quote. Quotes must be escaped even if the term isn't quoted. Fun. – Ray Foss Sep 20 '19 at 20:13
4

I use this simple function to convert an string[][] to a csv file. It quotes the cell, if it contains a ", a , or other whitespace (except blanks):

/**
 * Takes an array of arrays and returns a `,` sparated csv file.
 * @param {string[][]} table
 * @returns {string}
 */
export function toCSV(table: string[][]) {
    return table
        .map(row =>
            row
                .map(cell => {
                    // We remove blanks and check if the column contains
                    // other whitespace,`,` or `"`.
                    // In that case, we need to quote the column.
                    if (cell.replace(/ /g, '').match(/[\s,"]/)) {
                        return '"' + cell.replace(/"/g, '""') + '"';
                    }
                    return cell;
                })
                .join(',')
        )
        .join('\n');
}
Michael_Scharf
  • 33,154
  • 22
  • 74
  • 95
2

In my case, I didn't want to quote strings that did not need quoting. So I test the string for nasty characters before quoting it.

function escapeCSV (term) {
  if (term.match && term.match(/,|"/))  {
    return `"${term.replace('"','""')}"`
  } else {
    return term
  }
}
Ray Foss
  • 3,649
  • 3
  • 30
  • 31
  • 3
    Careful: `replace` will only replace the first occurrence. Use regex with the `g` modifier (`/"/g`) or the new `replaceAll` function. – Aron Jan 19 '21 at 13:58