0

I am new to python and using numpy to read a csv into an array .So I used two methods:

Approach 1

train = np.asarray(np.genfromtxt(open("/Users/mac/train.csv","rb"),delimiter=","))

Approach 2

with open('/Users/mac/train.csv') as csvfile:
        rows = csv.reader(csvfile)
        for row in rows:
            newrow = np.array(row).astype(np.int)
            train.append(newrow)

I am not sure what is the difference between these two approaches? What is recommended to use?

I am not concerned which is faster since my data size is small but instead concerned more about differences in the resulting data type.

Daniel F
  • 13,620
  • 2
  • 29
  • 55
Ricky
  • 2,662
  • 5
  • 25
  • 57
  • 3
    Why not [pandas](http://pandas.pydata.org/)? It's simple: `pd.read_csv('path/to/file')` – Lucas Sep 10 '18 at 06:55
  • 2
    Aside from @Lucas great suggestion, the use case also depends on whether your data contains a mixture of different data types, or is more heterogeneous. – dennlinger Sep 10 '18 at 06:56
  • It has just a single data type integer in the file – Ricky Sep 10 '18 at 06:56
  • Possible duplicate of [The fastest way to read input in Python](https://stackoverflow.com/questions/15096269/the-fastest-way-to-read-input-in-python) – Daniel F Sep 10 '18 at 07:30
  • 1
    `What is recommended to use?` This is a broad question. What *specifically* are you concerned about? If it's not performance, is it readability, or something else? – jpp Sep 10 '18 at 08:58

2 Answers2

2

You can use pandas also, it is better and simple to use.

import pandas as pd
import numpy as np

dataset = pd.read_csv('file.csv')
# get all headers in csv
values = list(dataset.columns.values)

# get the labels, assuming last row is labels in csv
y = dataset[values[-1:]]
y = np.array(y, dtype='float32')
X = dataset[values[0:-1]]
X = np.array(X, dtype='float32')
Rachit Tayal
  • 1,190
  • 14
  • 20
1

So what is the difference in the result?

genfromtxt is the numpy csv reader. It returns an array. No need for an extra asarray.

The second expression is incomplete, looks like would produce a list of arrays, one for each line of the file. It uses the generic python csv reader which doesn't do much other than read a line and split it into strings.

hpaulj
  • 221,503
  • 14
  • 230
  • 353