2

I recently try to use node js to collect some data from other web like yahoo finance, one of urls like this "http://real-chart.finance.yahoo.com/table.csv?s=AAPL&a=11&b=12&c=1999&d=01&e=4&f=2016&g=d&ignore=.csv", if i put this url into a browser, a popup will be prompted. while in my node code this url will not be found.

var fs = require('fs');
var http = require('http');
var url = require('url');
var csv = require( "fast-csv" );

// var FILENAME = "file/table.csv";
var FILENAME = "http://real-chart.finance.yahoo.com/table.csv?s=AAPL&a=11&b=12&c=1999&d=01&e=4&f=2016&g=d&ignore=.csv";

function fast_csv_read(filename)
{
    csv.fromPath(filename)
    .on("data", function(data){
        console.log("current data: ");
        console.log(data);
    })
    .on("end", function(){
        console.log("done reading");
    });
}

fast_csv_read(FILENAME);

if I download this file with browser and save it in "file/table.csv", it works fine. no idea what is going wrong...

mklement0
  • 382,024
  • 64
  • 607
  • 775
leon.li
  • 23
  • 4

2 Answers2

5

.fromPath accepts only file paths, not URLs.

You must retrieve the document from the URL yourself first, and provide its contents to the fast-csv module in one of the following ways:

  • Pass the document contents to .fromString()
  • Pass a readable stream to .fromStream()
  • Pipe a readable stream to .parse()

The request module provides a convenient way to return a readable stream from a URL; install it with npm --save install request.

For instance, passing a readable stream to .fromStream() would look like this:

#!/usr/bin/env node

var csv = require( "fast-csv" );

// Require the 'request' module.
// Install it with `npm install --save request`.
var request = require('request');

var URL = "http://real-chart.finance.yahoo.com/table.csv?s=AAPL&a=11&b=12&c=1999&d=01&e=4&f=2016&g=d&ignore=.csv";

function fast_csv_read_url(url)
{
    // Let request return the document pointed to by the URL
    // as a readable stream, and pass it to csv.fromStream()
    csv.fromStream(request(url))
      .on("data", function(data){
        console.log("current data: ");
        console.log(data);
      })
      .on("end", function(){
        console.log("done reading");
      });
}

fast_csv_read_url(URL);
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 2
    Because `request` is deprecated, I went on a journey to understand how to get along with alternatives. I found this: https://stackoverflow.com/a/65976684/6105259 which referenced to this: https://philna.sh/blog/2020/08/06/how-to-stream-file-downloads-in-Node-js-with-got/. And I combined it with this: https://c2fo.github.io/fast-csv/docs/parsing/methods – Emman Jan 13 '22 at 22:21
0

The answer given by @mklement0 seems simple and great, but unfortunately request module is deprecated. Because got is a recommended alternative, I ended up with the following code:

import * as csv from "fast-csv";
import got from "got";

var my_url = "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-01-11/colony.csv"

got.stream(my_url)
    .pipe(csv.parse()) // https://c2fo.github.io/fast-csv/docs/parsing/methods
    .on('error', error => console.error(error))
    .on('data', row => console.log(`ROW=${JSON.stringify(row)}`))
    .on('end', rowCount => console.log(`Parsed ${rowCount} rows`));

See more about got in this blog post

Emman
  • 3,695
  • 2
  • 20
  • 44