I need to parse a very large text file line by line, apply some text manipulation to it, .replace()
etc. then populate these results into an array by index.
I need this to be returned to the global scope as I need it to be a function, this will be a module I will use in a project.
I am using readline
as I would like to not have to use outside libraries and like to always use as close to base JS
as possible.
I am also using TypeScript
in my minimal example here:
import fs from 'fs';
import readline from 'readline';
const streamParseGEOgds = async (inputFile: string, verbose: boolean) => {
let datasets: string[] = [];
let line_counter = 0;
let entry_counter = -1;
return await new Promise((resolve, reject) => {
const rl = readline.createInterface({
input: fs.createReadStream(inputFile),
crlfDelay: Infinity
});
rl.on('line', (line: string) => {
line_counter++;
if (line === "") {
entry_counter++;
// datasets[entry_counter] = "";
}
// ------------------------------
const summary_regex: RegExp = /^[0-9]+. /;
if (summary_regex.test(line)) {
datasets[entry_counter] = line.replace(summary_regex, "");
}
// ------------------------------
if (verbose) {
console.log(`streamParseGEOgds line ${line_counter}: ${line}`);
}
}).on('close', () => {
// I can log the object from inside here but I want to get out of this scope to the global scope
resolve(datasets);
});
return datasets;
});
}
const res = streamParseGEOgds("./gds_result.txt", false).then(async (datasets) => {
console.log(datasets); // logs what I want; the parsed data
return await datasets;
})
console.log(`Logging the res object: ${res}`); // returns a pending promise and does not return the actual data I want
My input file looks something like this:
1. Glycosylated clusterin species facilitate amyloid beta toxicity in human neurons.
(Submitter supplied) Clusterin (CLU) is one of the most significant genetic risk factors for late onset Alzheimer’s disease. Numerous studies have now demonstrated that CLU-AD mutations and amyloid-β (Aβ) treatment alter the trafficking and localisation of glycosylated CLU. iPSCs with altered CLU trafficking were generated following the removal of CLU exon 2 by CRISPR/Cas9 gene editing. Neurons were generated from control, unedited and exon 2 -/- iPSCs and were incubated with aggregated Aβ peptides. more...
Organism: Homo sapiens
Type: Expression profiling by high throughput sequencing
Platform: GPL24676 18 Samples
FTP download: GEO (TXT) ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE207nnn/GSE207466/
Series Accession: GSE207466 ID: 200207466
2. CROPSeq of Putative AD- and PSP-associated cis-Regulatory Regions in iPSC-derived Neurons and Microglia
(Submitter supplied) We performed a pooled CRISPRi screen (CROP-seq) and genome editing to validate 19 genetic variants prioritized from massively parallel reporter assays to screen 5,706 polymorphisms from genome-wide association studies for both Alzheimer’s disease (AD) and Progressive Supranuclear Palsy (PSP) across 11 distinct loci. This allowed us to pinpoint regulatory targets in a cell-type specific manner.
Organism: Homo sapiens
Type: Expression profiling by high throughput sequencing; Other
Platform: GPL24676 4 Samples
FTP download: GEO ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE207nnn/GSE207099/
Series Accession: GSE207099 ID: 200207099
3. Spermidine reduces neuroinflammation and soluble amyloid beta in an Alzheimer’s disease mouse model
(Submitter supplied) Deposition of amyloid beta (Aβ) and hyperphosphorylated tau along with glial cell-mediated neuroinflammation are prominent pathogenic hallmarks of Alzheimer’s disease (AD). In recent years, impairment of autophagy has been found to be another important feature contributing to AD progression. Therefore, the potential of the autophagy activator spermidine, a small body-endogenous polyamine often used as dietary supplement, was assessed on Aβ pathology and glial cell-mediated neuroinflammation. more...
Organism: Mus musculus
Type: Expression profiling by high throughput sequencing
Platform: GPL24247 8 Samples
FTP download: GEO (H5, RDS) ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE206nnn/GSE206202/
Series Accession: GSE206202 ID: 200206202
The first and last lines are always empty, this is important as I use this for parsing the information.
Can someone help me please? I am getting a promise back and cannot find a way to get my data outside of the callback scope. I understand that my information is all there but I really need it to return to global scope so I can return this at the end as the array of parsed data.
Finally here is my tsconfig.json
:
{
"compilerOptions": {
"target": "ES2020",
"module": "CommonJS",
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"skipLibCheck": true
}
}