4

I'm working on a project that involves a csv file in Google Drive that is updated with new data every minute or so.

I've built a spreadsheet dashboard to make the data in the csv more meaningful.

I made the mistake of assuming that I could use the Google Spreadsheet function =importdata(url) to get the data from the CSV in Google Drive into my Google Spreadsheet, however, that produces an error unless I make the CSV public, which is not feasible for security and privacy reasons. Even if I do make the CSV public and use importdata, the data that comes in is completely malformed and unuseable - it looks nothing like the actual CSV.

malformed data with importdats(url)

I'm trying to write a script to automatically import the csv data using DriveApp to open the csv file, Utilities.parseCsv to turn the csv into an array of data, and then setValues to write the data to the sheet.

function importData() {
  var ss = SpreadsheetApp.getActive();
  var file = DriveApp.getFilesByName("Agent Performance.csv")
  var csv = file.next().getBlob().getDataAsString();
  var csvData = Utilities.parseCsv(csv);
  var sheet = ss.getSheetByName('CSV Import TEST');
  for (var i = 0; i < csvData.length; i++) {
    sheet.getRange(i+1, 1, 1, csvData[i].length).setValues(new Array(csvData[i]));
  }
}

The issue is I get replacement characters like: � all throughout the data that gets written to the sheet. It's so strange. The sheet looks normal, but if you click on a cell to see it's value the formula bar has whatever the text is in the cell, but between each character is a �. This makes any calculation in Google Sheets impossible.

Notice replacement characters in formula bar

If I just import the same CSV using Google Sheets "File" and "Import" and "Replace current sheet", the data comes in fine. This isn't a good solution because really I want no more than a few minutes of a delay between when the CSV updates and when the dashboard in the Google Sheets update. If I have to manually upload the CSV, it defeats the purpose of the system.

Any help would be greatly appreciated. Thank you!

Syscall
  • 19,327
  • 10
  • 37
  • 52
Jesse Spevack
  • 113
  • 1
  • 2
  • 6
  • How are you getting the csv file uploaded to your Google Drive? You can use Google's [Drive Rest API](https://developers.google.com/drive/v2/reference/files/insert), and make an HTTPS POST request to upload the csv file. Apps Script has a way to make HTTPS POST requests with `UrlFetchApp.fetch()`. Are you looking for a different way to upload the file, or a way to parse the content? – Alan Wells Feb 13 '16 at 18:51
  • I'm looking for a way to parse the content. I'm not 100% clear on how the csv is getting updated, though so maybe an https post request will work. – Jesse Spevack Feb 14 '16 at 00:39

3 Answers3

5

Have you check the file charset? You can specify it when calling getDataAsString(charset). Try this:

function importData() {
  var ss = SpreadsheetApp.getActive();
  var file = DriveApp.getFilesByName("Agent Performance.csv")
  var csv = file.next().getBlob().getDataAsString('ISO-8859-1'); //note the charset
  var csvData = Utilities.parseCsv(csv);
  //unless you csv has variable amount of columns per line, you should do this
  if(csvData.length > 0) {
    ss.getSheetByName('CSV Import TEST')
      .getRange(1, 1, csvData.length, csvData[0].length).setValues(csvData);
  } else
    throw 'Blank file';
}
Henrique G. Abreu
  • 17,406
  • 3
  • 56
  • 65
  • Thanks for this! When I tried the code I get the same strange characters. I also tried with 'utt-8' and ascii as the charset to no avail. Thanks again for the suggestion. Any other ideas? – Jesse Spevack Feb 14 '16 at 00:42
  • Do you know the encoding used by whoever is generating this csv? If you don't, try using a program to detect the charset (e.g. `file` in linux). Or can you share a sample file? – Henrique G. Abreu Feb 14 '16 at 01:37
  • I've reached out to find out about the encoding used. I can't share the file because it has private info. Would opening the CSV in excel or downloading and importing it into a Google Sheet (both which work fine), so that I can anonymize the data affect the encoding for the purpose of this thread? – Jesse Spevack Feb 14 '16 at 05:06
1

This may help but you will probably need to investigate whether it causes other difficulties with your data:

I had the same problem importing a .txt file containing bank transaction data as the bank doesn't offer CSV download files. I found that the odd Characters are FFFD which appear to be inserted by the fileXYZ.getblob() method as substitutes for unrecognized characters for unrecognized codes, in my case these are replaced by spaces.

my (very basic) solution once you have a file loaded into is as follows..

function getBankTransactionFile(fileNameToGet) {
// fileNameToGet is .txt and stored in folder specified in Var list

 var  inputFileFolderID = '0B2XXX insert your folder ID',
     fldrID = DriveApp.getFolderById(inputFileFolderID),
     theFileRetrieved = fldrID.getFilesByName('yourFileName'),
     csvFile, cntFiles = 0;

// Even if it's only one file, must iterate a while loop in order to access the file. Google drive will allow multiple files of the same name.
  while (theFileRetrieved.hasNext()) {
    var fileXYZ = theFileRetrieved.next();
    cntFiles = cntFiles + 1;
    csvFile = Utilities.parseCsv(fileXYZ.getBlob()
              .getDataAsString().replace('\uFFFD'," ",'g'), "\n")
    // Utilities.parseCsv(csv, delimiter) returns 2D array but the fileXYZ 
    // text loaded has FFFD characters inserted so these are substituted for
    // 'space' using the .replace method and 'g' for global flag
  }
  return csvFile;
}

I am brand new to GAS (converting from VBA) so pretty sure there is a more refined way but it works for my data.. note the \n is the newline character as the specified delimiter for my data. I found out what the odd character were by using Logger to show the raw data string, then extracting the code .charCodeAt(n) counting the characters to find n. As the .txt will import you can see what the unrecognized characters should be.. spaces in my case.

Vinoth Krishnan
  • 2,925
  • 6
  • 29
  • 34
Robin
  • 11
  • 1
1

I had the same requirement and the same problem with a csv file. What I did may be a workaround but worked at least fine for me.

The "�" may actually be any type of ASCII character that is not recognized so in my case searching for "\uFFFD" did not solve the problem. So what I did is basically convert the payload in binary data. There I managed to notice that between all characters a NULL was being delivered (ASCII Code 0). This was in my case the �. So what I did is rebuild the byte array without the 0s and then copy it in the spreadsheet again.

var response = UrlFetchApp.fetch(theUrl);
var payload = response.getContentText();
//Get byte Array 
var bytes= response.getContent();
var myArray = [];
//Build byte array without the faulty characters
for ( var i =1 ; i<bytes.length; i++){
  if (bytes[i] != 0){
   myArray.push(bytes[i]);
  }
}
//Reconvert to string.
var newArray = Utilities.newBlob(myArray).getDataAsString();

This script in my case also works fine if I am importing numbers and using them in formulas.

nagam11
  • 133
  • 1
  • 9