0

I have a csv file and I have to read this in jupyter notebook with np.genfromtxt(). (Unfortunately, I cannot use pd.read_csv() and even other pandas functions or methods.) Instead of showing you the file, let me give you an example. The file is as below:

Item,Color,Price
iPhone,Red,"1,000"
Galaxy,Black,"1,100"

I made a code that can read such csv file as:

def csv_opener(path):
    with open(path, 'r') as f:
        first_line = f.readline()
        num_col = len(first_line.split(','))
    
    return np.genfromtxt(path, delimiter=',', dtype=None, usecols=range(num_col))

You can see there are commas in Price values. But since delimiter is set as a comma, the "csv_opener" function reads the file as:

iPhone | Red | '"1' | '000"'

Here are the questions.

  1. How can I make my "csv_opener" function recognize double-quoted value as one? (not '"1' | '000"' -> '1,000')
  2. How can I change the data type after reading them? (not '1,000' -> numeric types such as integer/float)
  • Does this solve your problem? [Using numpy.genfromtxt to read a csv file with strings containing commas](https://stackoverflow.com/q/17933282/1609514). I used google search to find this. – Bill Apr 06 '23 at 03:46
  • genfrontxt doesn't handle quotes. The newest `loadtxt` does – hpaulj Apr 06 '23 at 03:59

1 Answers1

0

Just use the Python csv module.

cat phone.csv                                                                                                                                                              
Item,Color,Price
iPhone,Red,"1,000"
Galaxy,Black,"1,100"

import csv

with open('phone.csv', 'r', newline='') as csv_file:
    c_reader = csv.reader(csv_file, delimiter=',')
    for row in c_reader:
        print(row)

['Item', 'Color', 'Price']
['iPhone', 'Red', '1,000']
['Galaxy', 'Black', '1,100']

CSV is a text format so the only type you will get out of it is string. You can create something like a database table with specific types that you transfer the data to. Though this depends on the string representations being consistent and correct for the receiving type.

Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28