Error while working on .csv with load_csv

Question

I am trying to work on the below code:

ds = load_csv('C:\\User.csv')
f = open(ds,'r')
lines = f.readlines()[1:]
print(lines)
f.close()

First line of dataset is string. I am getting the below error:

TypeError: expected str, bytes or os.PathLike object, not list

Though when I try to open the file with below code it works:

filename='C:\\User.csv'
f = open(filename,'r')
lines = f.readlines()[1:]
print(lines)
f.close()

I am ignoring the first line because its string and rest of the dataset is float.

Update:

load_csv

def load_csv(ds):
    dataset = list()
    with open(ds, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
            return dataset

Even if I use this way still get the error:

ds = load_csv('C:\\Users.csv')
minmax = dataset_minmax(ds)
normalize_dataset(ds, minmax)

def dataset_minmax(dataset):
    minmax = list()
    for i in range(len(dataset[0])):
        col_values = [row[i] for row in dataset]
        value_min = min(col_values)
        value_max = max(col_values)
        minmax.append([value_min, value_max])
    return minmax

def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row)):
            row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])

It gives error on:

row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])

Error:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Where is the `load_csv` function from? It’s common to use pandas `read_csv` function. — Jack Moody, Mar 30 '19 at 17:48
Also in future, add line numbers and mention the line where you get the error! — senior_mle, Mar 30 '19 at 17:50

score 0 · Answer 1 · answered Mar 30 '19 at 18:44

0

Since you're now getting a different error, I'll give a second answer.

This error means that the two variables in your subtraction are strings, not numbers.

In [1]: 5 - 3
Out[1]: 2

In [2]: '5' - '3'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-4ef7506473f1> in <module>
----> 1 '5' - '3'

TypeError: unsupported operand type(s) for -: 'str' and 'str'

This is because the CSV reader assumes everything is a string. You need to convert it to floats, e.g., by changing load_csv to do something like dataset.append(list(map(float, row))) instead of your existing append statement.

The min-max stuff doesn't fail, because Python's min and max work on strings, too:

In [3]: min('f', 'o', 'o', 'b', 'a', 'r')
Out[3]: 'a'

However, it might be giving you incorrect answers:

In [4]: min('2.0', '10.0')
Out[4]: '10.0'

By the way, if you're doing much along these lines, you'd probably benefit from using the Pandas package instead of rolling your own.

answered Mar 30 '19 at 18:44

dwhswenson

503
4
10

even if i change that line to dataset.append(list(map(float, row))), it still giving error 'dataset.append(list(map(float, row))) ValueError: could not convert string to float: '7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6'' – AHF Mar 30 '19 at 18:48
Your input data is semi-colon separated, not comma separated. CSV would have a line '7,0.27,0.36,...' instead of '7;0.27;0.36;...'. Use `delimiter=';'` when you create the CSV `reader`. – dwhswenson Mar 30 '19 at 18:52
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0]) ZeroDivisionError: float division by zero – AHF Mar 30 '19 at 18:59
That's a different error. You have a row where the min and max values are the same, apparently. You might want to try printing out the intermediate data and seeing what you have. Or take advantage of other tools that exist: load it in pandas and then do this: https://stackoverflow.com/a/41532180/4205735 – dwhswenson Mar 30 '19 at 19:04

score -1 · Answer 2 · answered Mar 30 '19 at 17:53

-1

I am guessing the error is in the open command in your code. The reason why this fails is that the open command expects a string or operating system path-like object that is a handle to a file that it can open (like it says in the error). The function load_csv probably returns a list which is an incompatible format for open

answered Mar 30 '19 at 17:53

senior_mle

809
1
10
20

After your update it seems that my explanation were made with reasonable assumptions :) – senior_mle Mar 30 '19 at 17:54

score -1 · Answer 3 · answered Mar 30 '19 at 17:53

-1

Look at your first two lines where it doesn't work:

ds = load_csv('C:\\User.csv')
f = open(ds,'r')

ds is an object returned (from TensorFlow, I assume?) which contains the data. Then you open it as if it were a filename. This is why the interpreter complains. ds is the dataset, not the string representing the file.

It works in the other example, because you use a filename.

answered Mar 30 '19 at 17:53

dwhswenson

503
4
10

you can delete this answer so we can work on above one – AHF Mar 30 '19 at 19:16

Error while working on .csv with load_csv

3 Answers3