1

in node.js environment with Danfo.js, reading .csv files is very easy with readCSV(), formerly read_csv(), as shown in the official example:

const dfd = require("danfojs-node")

dfd.readCSV("file:///home/Desktop/user_names.csv")
  .then(df => {
  
   df.head().print()

  }).catch(err=>{
     console.log(err);
  })

However, I can't find a way to read .tsv files.

Is there a way to read tab-delimited files with Danfo.js?

In the source I find the follwing comment:

 * @param {config} (Optional). A CSV Config object that contains configurations
 *     for reading and decoding from CSV file(s).

But I'm new to javascript coming from R/Python, didn't know what to do from there.

webb
  • 4,180
  • 1
  • 17
  • 26
taiyodayo
  • 331
  • 4
  • 13
  • danfo.js is apparently a wrapper for tfjs backend. on tensorflow/js documentation, I found reference for the original function: https://js.tensorflow.org/api/latest/#data.csv How will I pass these parameters via Danfo.js? – taiyodayo Mar 03 '21 at 04:16

3 Answers3

2

Here is how to use readCSV (formerly read_csv) a tsv:

dfd.readCSV("file.csv", configs={delimiter:'\t'} )

Danfo.js documentation says:

Parameters: configs: object, optional Supported params are: ... csvConfigs: other supported Tensorflow csvConfig parameters. See https://js.tensorflow.org/api/latest/#data.csv

Then that page says:

csvConfig object optional: ... delimiter (string) The string used to parse each line of the input file.

This means that parameter you include in csvConfig in tf.data.csv() can also be included in configs in readCSV(), e.g., if this works:

tf.data.csv(x,csvConfig={y:z})

then this will also work:

dfd.readCSV(x,configs={y:z})

PS: has anyone else noticed thast Danfo.js readCSV is insanely slow? It takes me 9 seconds to dfd.readCSV a 23MB tsv. dfd.read_json brings this down to a still unusably slow 7 seconds. Compare this to 0.015 seconds to read a 22MB apache arrow file of the same data using apache-arrow js.

webb
  • 4,180
  • 1
  • 17
  • 26
0

Since it is only a wrapper arount tfjs implementation and reading a tsv file is not yet implemented in tfjs, maybe you can consider

  • replacing the tab with column and
  • use the csv reader
edkeveked
  • 17,989
  • 10
  • 55
  • 93
0

I faced a similar issue when I was trying to do something like this in a TypeScript Project:

import * as dfd from "danfojs-node";

async function readFromCSV(path: string): Promise<void> => {
    const reportDataFrame  = await dfd.readCSV(path, {
        transform: (value: string) => {
            if(value == null) {
                return "";
            }
            return value;
        }
    });
    //rest of the code
}

It resulted in the following error:

Type '{ transform: (value: string) => string; }' has no properties in common with type 'CsvInputOptionsNode'.

As per the documentation, readCSV should accept the config object as it in papaparse.

As per my understanding, the problem was that the typescript compiler was not able to find the type ParseConfig (The class CsvInputOptionsNode extends ParseConfig), and therefore it was throwing an error.

I solved this problem by installing the types for papaparse. You can find it here.

After adding types for papaparse, I updated the code as following:

import * as dfd from "danfojs-node";
import type { ParseConfig } from 'papaparse';

async function readFromCSV(): Promise<void> => {
    const config: ParseConfig = {
        transform: (value: string) => {
            if(value == null) {
                return "";
            }
            return value;
        }
    }
    const reportDataFrame  = await dfd.readCSV(path, config);
    //rest of the code
}
Aman Mulani
  • 295
  • 3
  • 6