Python Issue with ' b ' while importing cdv using load text from numpy

Question

Sorry I'm new so if this has been covered share the link because I couldn't find it.

I have a csv that shows as the following in a text viewer...

TeamCode,Name,ConferenceCode
5,Akron,875
8,Alabama,911
9,UAB,24312

I was trying to import it into a dictionary with the following

import numpy as np
key_value = np.loadtxt('team.csv', delimiter=",",skiprows = 1, dtype = 'str')
for i in key_value:
    print(i)  

mydict = { k:[v, z] for k,v,z in key_value}

Which returns

["b'5'" "b'Akron'" "b'875'"]
["b'8'" "b'Alabama'" "b'911'"]
["b'9'" "b'UAB'" "b'24312'"]

I don't know why I am getting the b's. If there is a better way of doing this let me know, but I'm trying to create dictionary from the csv and thought this should work. Since I am getting the b's things aren't working as planned using the dictionary.

New edit... I appreciate all the help, however does anyone know why I'm getting the b? It appears when importing the numpy module I get it, everything else is okay. I'm running on a mac, no reason that would cause it right?

how is the `b` effecting your code? – Padraic Cunningham Oct 06 '14 at 09:22 — Padraic Cunningham, Oct 06 '14 at 09:22
It makes it so the keys have the b in them. – ScriptPhoenix Oct 06 '14 at 10:22 — ScriptPhoenix, Oct 06 '14 at 10:22

burnpanck · Accepted Answer · 2014-10-06T23:39:05.617

This a bug in np.loadtxt, related to python 3's distinction between byte strings and text strings (see numpy issue #2715): It is processing the CSV as byte strings, while CSV should be considered text.

Of course, as rightfully pointed out by Padraic Cunningham, you do not need numpy for this task, and could use the csv module instead. If you want to stick with numpy, you have two options until that bug is fixed:

Specify dtype=bytes instead, which will properly interpret the values as byte strings. Then, convert them to real strings. In case all fields are strings, this can be done very concisely as follows:
```
key_value = np.loadtxt(
    'team.csv',
    delimiter=",",
    skiprows = 1,
    dtype = bytes
).astype(str)
```

Specify the right converter in the call to np.loadtxt manually:

key_value = np.loadtxt(
    'team.csv',
    delimiter=",",
    skiprows = 1,
    dtype = str,
    converters = {k:np.compat.asstr for k in range(3)}
)

The OP has more than two options, one of which is don't use numpy at all as it is less efficient than opening the file normally — Padraic Cunningham, Oct 06 '14 at 12:24
Perfect thanks for the explanation of one why this is occurring! — ScriptPhoenix, Oct 06 '14 at 18:13

Padraic Cunningham · Answer 2 · 2014-10-06T15:26:26.097

You could decode then you can access the keys without using b"5" etc..:

mydict = { k.decode("utf-8"):[v.decode("utf-8"), z.decode("utf-8")] for k,v,z in key_value}


In [46]: mydict
Out[46]: {'5': ['Akron', '875'], '8': ['Alabama', '911'], '9': ['UAB', '24312']}
In [47]: mydict["5"]
Out[47]: ['Akron', '875']

You could just open it normally:

mydict = { }
with open('in.csv')as f:
    next(f) # skip header
    for line in  f:
        k,v,z = line.rstrip().split(",") 
        mydict[k] = [v,z]
print(mydict)
{'9': ['UAB', '24312'], '8': ['Alabama', '911'], '5': ['Akron', '875']}

Or using csv module:

import csv
mydict = { }
with open('out.txt') as f:
    next(f)
    reader = csv.reader(f,delimiter=",")
    for row in reader:
        k,v,z = row
        mydict[k] =  [v,z]
print(mydict)

If you want the numbers as ints use:

mydict[int(k)]=  [v,int(z)]

some timings:

In [39]: %%timeit
   ....: key_value = np.loadtxt(
   ....:     'team.csv',
   ....:     delimiter=",",
   ....:     skiprows = 1,
   ....:     dtype = bytes
   ....: ).astype(str)
   ....: mydict = { k:[v, z] for k,v,z in key_value}
   ....: 

10000 loops, best of 3: 123 µs per loop

In [40]: 

In [40]: %%timeit
   ....: mydict = { }
   ....: with open('team.csv') as f:
   ....:     next(f)
   ....:     reader = csv.reader(f,delimiter=",")
   ....:     for line in reader:
   ....:         k,v,z = line
   ....:         mydict[k]=  [v,z]
   ....: 
10000 loops, best of 3: 42.9 µs per loop

In [42]: %%timeit
mydict = { }
with open('team.csv')as f:
    next(f) # skip header
    for line in  f:
        k,v,z = line.rstrip().split(",") 
        mydict[k] = [v,z]
   ....: 
10000 loops, best of 3: 37.6 µs per loop

Using a file with 150 lines, numpy much less efficient:

In [12]: %%timeit
   ....:  key_value = np.loadtxt(
   ....:  'team.csv',
   ....:  delimiter=",",
   ....:  skiprows = 1,
   ....:  dtype = bytes
   ....:  ).astype(str)
   ....:  mydict = { k:[v, z] for k,v,z in key_value}
   ....: 
100 loops, best of 3: 2.01 ms per loop

In [13]: %%timeit
   ....: mydict = { }
   ....: with open('team.csv')as f:
   ....:     next(f) # skip header
   ....:     for line in  f:
   ....:         k,v,z = line.rstrip().split(",") 
   ....:         mydict[k] = [v,z]
   ....: 
10000 loops, best of 3: 165 µs per loop

Thanks for the response, so is there a way of decoding it on the read? I thought python 3 would ignore the character... — ScriptPhoenix, Oct 06 '14 at 09:58
Not sure to be honest but if you are decoding when reading it in or in your dict it is no different, you still have to do it somewhere. — Padraic Cunningham, Oct 06 '14 at 10:00
Thanks so your other scripts import the file but I'm still running into the issue with the b if I use the csv module. So that will "fix" the issue but I was hoping to understand why I'm getting that with the csv module and numpy module. — ScriptPhoenix, Oct 06 '14 at 10:28
I am using ubuntu and I can access using `mydict["5"]`. If you really want strings just use the `decode` method I showed you — Padraic Cunningham, Oct 06 '14 at 10:35
What amount of data have you used? Just the 3 lines given or more? — sebix, Oct 06 '14 at 15:08
Thanks for the advice about efficiency, I'll use that in the future. — ScriptPhoenix, Oct 06 '14 at 18:17

score 0 · Answer 3 · answered Oct 06 '14 at 11:17

I beleive you don't need numpy for this kind of task, because anyway you convert the data to Python dict. You can very easily implement parsing yourself:

with open('team.csv') as f:
    mydict = {k : [v, z] for k, v, z in (line.rstrip().split(',') for line in f)}

Python Issue with ' b ' while importing cdv using load text from numpy

3 Answers3