i'm using AWS's developer Reporting API in order to download a zip file containing a CSV sales report.
My problem is, I cannot read this file and i can't for the life of me figure out why. Whilst on the server terminal, cat my.csv
prints the file perfectly, as plain-english CSV, yet the command file my.csv
prints ASCII text, with very long lines
.
When I downloaded the file with SCP to my local machine and opened it, it opened perfectly with macOS Numbers. My machine also downloaded the file as CSV Text
type, according to file my.csv
output, despite the fact on the server it was ASCII
.
On my server, I cannot read this csv file using NodeJS packages for parsing CSV files and unzipping.
1. First my scripts start with downloading the zip file:
export async function downloadS3Zip(url, destination) {
const response = await axios.get(url, { responseType: "stream"});
const writer = fs.createWriteStream(destination, { "encoding": "utf-8"});// i've also tried without this option
response.data.pipe(writer);
const [failed, finished] = await to(checkFileWrite(writer));
if (failed) throw new Error('failed to write zip file to ', destination);
if (finished) log(`finished writing zip file to ${destination}`);
if (!finished) throw new Error("download s3 func: check file write didn't resolve properly.")
}
export async function checkFileWrite(writeStream) {
return new Promise((resolve, reject) => {
writeStream.on("finish", () => {
resolve(true)
})
writeStream.on("error", () => {
reject(true)
});
});
}
2. Then I unzip this file using node-stream-zip
export async function unzipZipFile(destination) {
const zip = new NodeStreamZip.async({ file: destination });
let finalFile;
zip.on('error', (err) => {
throw new Error(err)
});
zip.on('ready', () => log('All entries read: ', zip.entriesCount))
zip.on('extract', (entry, file) => {
log(`Extracted ${entry.name} to ${file}`);
log(entry)
finalFile = file;
});
await zip.extract(null, ZIP_DOWNLOAD_LOCATION);
await zip.close();
if (!finalFile) throw new Error('no file returned from unzip func.')
return finalFile
}
3. At this point, my directory will have the .zip file and .csv file
4. I've tried using papaparse
and csv-parse
, to no avail:
// papaparse example
export async function readCSVFile(filePath) {
const csvFile = fs.readFileSync(filePath)
const csvData = csvFile.toString()
console.log("\n\nBeginning parse..........")
return new Promise(resolve => {
Papa.parse(csvData, {
header: true,
complete: results => {
log('\nComplete', results.data.length, 'records.')
resolve(results.data)
}
})
})
}
When outputting the results.data
array, i get stuff like this
{
'PK\x03\x04\x14\x00\b\b\b\x00[E�T\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x17\x00\x00\x00sales-2022-06-00000.csv̽Y��ز6': "Ǿ:'B�w̓#�Bb\x14�Y���\x00I�@�b���Z��ݮR�sl*\f{w��媲2�V�ӓ��~\x19\x1D��8����>���\x1D����Kn\x13F��>m� ���\ro?^������\x7Fz�*��\x13�mdX��\x1EV����\x06Q���}�o�۰�xm��u��\x7F�T��\x17\x1F\x12��\x1C��x��G��w֦��~�\x1C'i����~p�_e�A�0>\x1CS���\x0F��t<��O�e����:����\x12/���1��_i\x14l�a��F�o����}�\x0E�Fg���m�����\x15�m\x14�C�\x1A\x1F��Ka�_��Y�o_��g�a�",
'��': 'i��C',
__parsed_extra: [
'qBC?z7�\x1F�v\x14DJ\x00���g�Ʒ���`�2�\x1D\x03F"�\x06r\x02\b\x16t:a\x04N!\x00p\x02ǂ�\x01"�\x18O�0$\x06"�R�$��c\x03\x01�L�M��\x00��ЯP~i�=��L��F}c�@���\\�T����>>D��8^��\x7FhstPj���N�U؉W�d',
'P�,�\x17��Ϸ�}�?_�\x7F�\x7F��я�er5��\x12\x12\x1AJ������0�\x1FX\x1Ay\x00MA\x00\x05��P\x00� 4�*�ɭ�-�I�x\x1D�(��I@��\x13˸��G�\\���ٛ\x01��\b�\x00�ðV\x06��VWn��.�0-\x0EK\x13�W�\x0E���Vo�}�_!�e�"� Q\x1A�\x00ȿ�\x0B� ����\x10��z!A�bfr( ���]G��u\x1A����\x10s^\x18��5W���5�O��',
`Q/�ڠ��:oƞ\\���r��K\x00/��V'Or-�&E�Jv�0����9+y��A�\x12�\x11�@i\x18�\x1F�\x0B�\x12�WU���q\x06LI\t\x15�\x1Bt],�n�Z�� \\���\x14�\x0F\x0505�=��-���!�*;�ݼ\x19�c\x7Fȏf��\x18y�,��(��a�P����\x14���m�LN���Q�"�\x19U�\x0B�g`,
'�^�\x15��z��D�\x15\x02\x13�^Z�\x12�_�',
'�^D^X�L�G�����\ta�f\x071)��:���0l\x13�?,%�{׳���v�\x16�d�OrpVơ�\x0EP�ޏ��V��\x11�\x16̮W�]\x0B9��\f��d\x19�J1:?�^\x01(�\x122�!���/��4',
'%ߌon>�Ϣ�8X���\x01�@Rf"A���8WD����\x1A+\x0E\x161�{r���\t�´?;1g5�V�U���\x1B���]�Ly岪\x0F��+/\x15Z\r~>��vi�^�\x07�i�Ņ�Q�&\x00�D= �\x19:F��c��\x04�'
]
} {
'PK\x03\x04\x14\x00\b\b\b\x00[E�T\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x17\x00\x00\x00sales-2022-06-00000.csv̽Y��ز6': '�0\x04�I1�\r��.\x11�NЩ��QnV"�5\x18��m��-ߙ�ڭ��l�\x19��jI�Q��C䭶}yq�q\x05vv�\t��hm�QW�,�0��\x1Ea��g\x19%�ݬ\x12t��So��fJ��똍�l�~��O6!�\x7F7\x1C���ƒ�uc����Dy�����U���X\x1D���+M� ��8�K#q���mE.<"K]�m\x7F��°\bɰ}\r�Yno.A�/��[�6ą�\x15�k�*�<}�\x1C\x12A�xg]c����3Q�a��w}O���Z,��k\x06�Ӣ>��\x01eq�\x04���\x1E������\fp�_���T(&�B����]��K�\x1C�p��3\x04���}\x7F��\x15���\x1E�#M\x10i\x02(��x�u+u��l\x15f�-g�k����\x19u�ۢ>��\x05���\x07%uFc��6]R���o�'
} {
// csv-parse example...
export async function parseCSVFile(filePath) {
const records = [];
const parser = fs.createReadStream(filePath).pipe(parse({ relax_quotes: true}));
parser.on("readable", () => {
let record;
while ((record = parser.read()) !== null) {
records.push(record);
}
});
await finished(parser);
return records;
}
The above example throws an error:
CsvError: Invalid Record Length: expect 1, got 5 on line 4
at Object.__onRecord (file:///usr/src/app/node_modules/csv-parse/lib/api/index.js:353:11)
at Object.parse (file:///usr/src/app/node_modules/csv-parse/lib/api/index.js:238:40)
at Parser._transform (file:///usr/src/app/node_modules/csv-parse/lib/index.js:31:26)
at Parser.Transform._write (node:internal/streams/transform:184:23)
at writeOrBuffer (node:internal/streams/writable:389:12)
at _write (node:internal/streams/writable:330:10)
at Parser.Writable.write (node:internal/streams/writable:334:10)
at ReadStream.ondata (node:internal/streams/readable:749:22)
at ReadStream.emit (node:events:394:28)
at addChunk (node:internal/streams/readable:312:12) {
code: 'CSV_RECORD_INCONSISTENT_FIELDS_LENGTH',
bytes: 1080,
comment_lines: 0,
empty_lines: 0,
invalid_field_length: 0,
lines: 4,
records: 1,
columns: false,
error: undefined,
header: false,
index: 5,
raw: undefined,
column: 5,
quoting: false,
record: [
'Ǿ:\'B�w̓#�Bb\x14�Y���\x00I�@�b���Z��ݮR�sl*\f{w��媲2�V�ӓ��~\x19\x1D��8����>���\x1D����Kn\x13F��>m� ���\ro?^������\x7Fz�*��\x13�mdX��\x1EV����\x06Q���}�o�۰�xm��u��\x7F�T��\x17\x1F\x12��\x1C��x��G��w֦��~�\x1C\'i����~p�_e�A�0>\x1CS���\x0F��t<��O�e����:����\x12/���1��_i\x14l�a��F�o����}�\x0E�Fg���m�����\x15�m\x14�C�\x1A\x1F��Ka�_��Y�o_��g�a�|i��C|qBC?z7�\x1F�v\x14DJ\x00���g�Ʒ���`�2�\x1D\x03F"�\x06r\x02\b\x16t:a\x04N!\x00p\x02ǂ�\x01"�\x18O�0$\x06"�R�$��c\x03\x01�L�M��\x00��ЯP~i�=��L��F}c�@���\\�T����>>D��8^��\x7FhstPj���N�U؉W�d|P�',
`�\x17��Ϸ�}�?_�\x7F�\x7F��я�er5��\x12\x12\x1AJ������0�\x1FX\x1Ay\x00MA\x00\x05��P\x00� 4�*�ɭ�-�I�x\x1D�(��I@��\x13˸��G�\\���ٛ\x01��\b�\x00�ðV\x06��VWn��.�0-\x0EK\x13�W�\x0E���Vo�}�_!�e�"� Q\x1A�\x00ȿ�\x0B� ����\x10��z!A�bfr( ���]G��u\x1A����\x10s^\x18��5W���5�O��|Q/�ڠ��:oƞ\\���r��K\x00/��V'Or-�&E�Jv�0����9+y��A�\x12�\x11�@i\x18�\x1F�\x0B�\x12�WU���q\x06LI\t\x15�\x1Bt]`,
'�n�Z�� \\���\x14�\x0F\x0505�=��-���!�*;�ݼ\x19�c\x7Fȏf��\x18y�',
'��(��a�P����\x14���m�LN���Q�"�\x19U�\x0B�g|�^�\x15��z��D�\x15\x02\x13�^Z�\x12�_�|�^D^X�L�G�����\ta�f\x071)��:���0l\x13�?',
'%�{׳���v�\x16�d�OrpVơ�\x0EP�ޏ��V��\x11�\x16̮W�]\x0B9��\f��d\x19�J1:?�^\x01(�\x122�!���/��4|%ߌon>�Ϣ�8X���\x01�@Rf"A���8WD����\x1A+\x0E\x161�{r���\t�´?;1g5�V�U���\x1B���]�Ly岪\x0F��+/\x15Z\r~>��vi�^�\x07�i�Ņ�Q�&\x00�D= �\x19:F��c��\x04�'
]
}
How can i parse this file in NodeJS that is apparently in ASCII format?