1

i'm using AWS's developer Reporting API in order to download a zip file containing a CSV sales report.

My problem is, I cannot read this file and i can't for the life of me figure out why. Whilst on the server terminal, cat my.csv prints the file perfectly, as plain-english CSV, yet the command file my.csv prints ASCII text, with very long lines.

When I downloaded the file with SCP to my local machine and opened it, it opened perfectly with macOS Numbers. My machine also downloaded the file as CSV Text type, according to file my.csv output, despite the fact on the server it was ASCII.

On my server, I cannot read this csv file using NodeJS packages for parsing CSV files and unzipping.

1. First my scripts start with downloading the zip file:

export async function downloadS3Zip(url, destination) {
    const response = await axios.get(url, { responseType: "stream"});
    const writer = fs.createWriteStream(destination, { "encoding": "utf-8"});// i've also tried without this option
    response.data.pipe(writer);
    const [failed, finished] = await to(checkFileWrite(writer));
    if (failed) throw new Error('failed to write zip file to ', destination);
    if (finished) log(`finished writing zip file to ${destination}`);
    if (!finished) throw new Error("download s3 func: check file write didn't resolve properly.")
}
export async function checkFileWrite(writeStream) {
    return new Promise((resolve, reject) => {
        writeStream.on("finish", () => {
            resolve(true)
        })
        writeStream.on("error", () => {
            reject(true)
        });
    });
} 

2. Then I unzip this file using node-stream-zip

export async function unzipZipFile(destination) {
    const zip = new NodeStreamZip.async({ file: destination });
    let finalFile;
    zip.on('error', (err) => {
        throw new Error(err)
    });
    zip.on('ready', () => log('All entries read: ', zip.entriesCount))
    zip.on('extract', (entry, file) => {
        log(`Extracted ${entry.name} to ${file}`);
        log(entry)
        finalFile = file;
    });
    await zip.extract(null, ZIP_DOWNLOAD_LOCATION);
    await zip.close();
    if (!finalFile) throw new Error('no file returned from unzip func.')
    return finalFile
}

3. At this point, my directory will have the .zip file and .csv file

4. I've tried using papaparse and csv-parse, to no avail:

// papaparse example
export async function readCSVFile(filePath) {
    const csvFile = fs.readFileSync(filePath)
    const csvData = csvFile.toString()
    console.log("\n\nBeginning parse..........")
    return new Promise(resolve => {
        Papa.parse(csvData, {
            header: true,
            complete: results => {
                log('\nComplete', results.data.length, 'records.')
                resolve(results.data)
            }
        })
    })
}

When outputting the results.data array, i get stuff like this

{
  'PK\x03\x04\x14\x00\b\b\b\x00[E�T\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x17\x00\x00\x00sales-2022-06-00000.csv̽Y��ز6': "Ǿ:'B�w̓#�Bb\x14�Y���\x00I�@�b���Z��ݮR�sl*\f{w��媲2�V�ӓ��~\x19\x1D��8����>���\x1D����Kn\x13F��>m� ���\ro?^������\x7Fz�*��\x13�mdX��\x1EV����\x06Q���}�o�۰�xm��u��\x7F�T��\x17\x1F\x12��\x1C�՗�x��G��w֦��~�\x1C'i����~p�_e�A�0>\x1CS���\x0F��t<��O�e����:����\x12/���1��_i\x14l�a��F�o����}�\x0E�Fg���m�����\x15�m\x14�C�\x1A\x1F��Ka�_��Y�o_��g�a�",
  '��': 'i��C',
  __parsed_extra: [
    'qBC?z7�\x1F�v\x14DJ\x00���g�Ʒ���`�2�\x1D\x03F"�\x06r\x02\b\x16t:a\x04N!\x00p\x02ǂ�\x01"�\x18O�0$\x06"�R�$��c\x03\x01�L�M��\x00��ЯP~i�=��L��F}c�@���\\�T����>>D��8^��\x7FhstPj���N�U؉W�d',
    'P�,�\x17��Ϸ�}�?_�\x7F�\x7F��я�er5��\x12\x12\x1AJ������0�\x1FX\x1Ay\x00MA\x00\x05��P\x00� 4�*�ɭ�-�I�x\x1D�(��I@��\x13˸��G�\\���ٛ\x01��\b�\x00�ðV\x06��VWn��.�0-\x0EK\x13�W�\x0E���Vo�}�_!�e�"� Q\x1A�\x00ȿ�\x0B� ����\x10��z!A�bfr( ���]G��u\x1A����\x10s^\x18��5W���5�O��',
    `Q/�ڠ��:oƞ\\���r��K\x00/��V'Or-�&E�Jv�0����9+y��A�\x12�\x11�@i\x18�\x1F�\x0B�\x12�WU���q\x06LI\t\x15�\x1Bt],�n�Z�� \\���\x14�\x0F\x0505�=��-���!�*;�ݼ\x19�c\x7Fȏf��\x18y�,��(��a�P����\x14���m�LN���Q�"�\x19U�\x0B�g`,
    '�^�\x15��z��D�\x15\x02\x13�^Z�\x12�_�',
    '�^D^X�L�G�����\ta�f\x071)��:���0l\x13�?,%�{׳���v�\x16�d�OrpVơ�\x0EP�ޏ��V��\x11�\x16̮W�]\x0B9��\f��d\x19�J1:?�^\x01(�\x122�!���/��4',
    '%ߌon>�Ϣ�8X���\x01�@Rf"A���8WD����\x1A+\x0E\x161�{r޾���\t�´?;1g5�V�U���\x1B���]�Ly岪\x0F��+/\x15Z\r~>��vi�^�\x07�i�Ņ�Q�&\x00�D= �\x19:F��c��\x04�'
  ]
} {
  'PK\x03\x04\x14\x00\b\b\b\x00[E�T\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x17\x00\x00\x00sales-2022-06-00000.csv̽Y��ز6': '�0\x04�I1�\r��.\x11�NЩ��QnV"�5\x18��m��-ߙ�ڭ��l�\x19��jI�Q��C䭶}yq�q\x05vv�\t��h׿m�QW�,�0��\x1Ea��g\x19%�ݬ\x12t��So��fJ��똍�l�~��O6!�\x7F7\x1C���ƒ�uc����Dy�����U���X\x1D���+M� ��8�K#q���mE.<"K]�m\x7F��°\bɰ}\r�Yno.A�/��[�6ą�\x15�k�*�<}�\x1C\x12A�xg]c����3Q�a��w}O���Z,��k\x06�Ӣ>��\x01eq�\x04���\x1E������\fp�_���T(&�B����]��K�\x1C�p��3\x04���}\x7F��\x15���\x1E�#M\x10i\x02(��x�u+u��l\x15f�-g�k����\x19u�ۢ>��\x05���\x07%uFc��6]R���o�'
} {
// csv-parse example...
export async function parseCSVFile(filePath) {    
    const records = [];
    const parser = fs.createReadStream(filePath).pipe(parse({ relax_quotes: true}));

    parser.on("readable", () => {
        let record;
        while ((record = parser.read()) !== null) {
            records.push(record);
        }
    });
    await finished(parser);
    return records;
}

The above example throws an error:

CsvError: Invalid Record Length: expect 1, got 5 on line 4
    at Object.__onRecord (file:///usr/src/app/node_modules/csv-parse/lib/api/index.js:353:11)
    at Object.parse (file:///usr/src/app/node_modules/csv-parse/lib/api/index.js:238:40)
    at Parser._transform (file:///usr/src/app/node_modules/csv-parse/lib/index.js:31:26)
    at Parser.Transform._write (node:internal/streams/transform:184:23)
    at writeOrBuffer (node:internal/streams/writable:389:12)
    at _write (node:internal/streams/writable:330:10)
    at Parser.Writable.write (node:internal/streams/writable:334:10)
    at ReadStream.ondata (node:internal/streams/readable:749:22)
    at ReadStream.emit (node:events:394:28)
    at addChunk (node:internal/streams/readable:312:12) {
  code: 'CSV_RECORD_INCONSISTENT_FIELDS_LENGTH',
  bytes: 1080,
  comment_lines: 0,
  empty_lines: 0,
  invalid_field_length: 0,
  lines: 4,
  records: 1,
  columns: false,
  error: undefined,
  header: false,
  index: 5,
  raw: undefined,
  column: 5,
  quoting: false,
  record: [
    'Ǿ:\'B�w̓#�Bb\x14�Y���\x00I�@�b���Z��ݮR�sl*\f{w��媲2�V�ӓ��~\x19\x1D��8����>���\x1D����Kn\x13F��>m� ���\ro?^������\x7Fz�*��\x13�mdX��\x1EV����\x06Q���}�o�۰�xm��u��\x7F�T��\x17\x1F\x12��\x1C�՗�x��G��w֦��~�\x1C\'i����~p�_e�A�0>\x1CS���\x0F��t<��O�e����:����\x12/���1��_i\x14l�a��F�o����}�\x0E�Fg���m�����\x15�m\x14�C�\x1A\x1F��Ka�_��Y�o_��g�a�|i��C|qBC?z7�\x1F�v\x14DJ\x00���g�Ʒ���`�2�\x1D\x03F"�\x06r\x02\b\x16t:a\x04N!\x00p\x02ǂ�\x01"�\x18O�0$\x06"�R�$��c\x03\x01�L�M��\x00��ЯP~i�=��L��F}c�@���\\�T����>>D��8^��\x7FhstPj���N�U؉W�d|P�',
    `�\x17��Ϸ�}�?_�\x7F�\x7F��я�er5��\x12\x12\x1AJ������0�\x1FX\x1Ay\x00MA\x00\x05��P\x00� 4�*�ɭ�-�I�x\x1D�(��I@��\x13˸��G�\\���ٛ\x01��\b�\x00�ðV\x06��VWn��.�0-\x0EK\x13�W�\x0E���Vo�}�_!�e�"� Q\x1A�\x00ȿ�\x0B� ����\x10��z!A�bfr( ���]G��u\x1A����\x10s^\x18��5W���5�O��|Q/�ڠ��:oƞ\\���r��K\x00/��V'Or-�&E�Jv�0����9+y��A�\x12�\x11�@i\x18�\x1F�\x0B�\x12�WU���q\x06LI\t\x15�\x1Bt]`,
    '�n�Z�� \\���\x14�\x0F\x0505�=��-���!�*;�ݼ\x19�c\x7Fȏf��\x18y�',
    '��(��a�P����\x14���m�LN���Q�"�\x19U�\x0B�g|�^�\x15��z��D�\x15\x02\x13�^Z�\x12�_�|�^D^X�L�G�����\ta�f\x071)��:���0l\x13�?',
    '%�{׳���v�\x16�d�OrpVơ�\x0EP�ޏ��V��\x11�\x16̮W�]\x0B9��\f��d\x19�J1:?�^\x01(�\x122�!���/��4|%ߌon>�Ϣ�8X���\x01�@Rf"A���8WD����\x1A+\x0E\x161�{r޾���\t�´?;1g5�V�U���\x1B���]�Ly岪\x0F��+/\x15Z\r~>��vi�^�\x07�i�Ņ�Q�&\x00�D= �\x19:F��c��\x04�'
  ]
}

How can i parse this file in NodeJS that is apparently in ASCII format?

trincot
  • 317,000
  • 35
  • 244
  • 286
J.E.C.
  • 2,616
  • 2
  • 18
  • 21
  • How are you calling these different methods? What is being passed to `readCSVFile` ? – Anon Coward Jul 19 '22 at 18:16
  • in a single script, i'm calling them in the numeric order given in my answer. ```readCSVFile``` receives a file path as a string, which is the path to the CSV file. – J.E.C. Jul 20 '22 at 06:36
  • That seems unlikely. The `results.data` you showed contains a zip file, not a csv file. – Anon Coward Jul 20 '22 at 17:26

0 Answers0