1

Sorry I'm new so if this has been covered share the link because I couldn't find it.

I have a csv that shows as the following in a text viewer...

TeamCode,Name,ConferenceCode
5,Akron,875
8,Alabama,911
9,UAB,24312

I was trying to import it into a dictionary with the following

import numpy as np
key_value = np.loadtxt('team.csv', delimiter=",",skiprows = 1, dtype = 'str')
for i in key_value:
    print(i)  

mydict = { k:[v, z] for k,v,z in key_value}

Which returns

["b'5'" "b'Akron'" "b'875'"]
["b'8'" "b'Alabama'" "b'911'"]
["b'9'" "b'UAB'" "b'24312'"]

I don't know why I am getting the b's. If there is a better way of doing this let me know, but I'm trying to create dictionary from the csv and thought this should work. Since I am getting the b's things aren't working as planned using the dictionary.

New edit... I appreciate all the help, however does anyone know why I'm getting the b? It appears when importing the numpy module I get it, everything else is okay. I'm running on a mac, no reason that would cause it right?

3 Answers3

3

This a bug in np.loadtxt, related to python 3's distinction between byte strings and text strings (see numpy issue #2715): It is processing the CSV as byte strings, while CSV should be considered text.

Of course, as rightfully pointed out by Padraic Cunningham, you do not need numpy for this task, and could use the csv module instead. If you want to stick with numpy, you have two options until that bug is fixed:

  1. Specify dtype=bytes instead, which will properly interpret the values as byte strings. Then, convert them to real strings. In case all fields are strings, this can be done very concisely as follows:

    key_value = np.loadtxt(
        'team.csv',
        delimiter=",",
        skiprows = 1,
        dtype = bytes
    ).astype(str)
    
  2. Specify the right converter in the call to np.loadtxt manually:

    key_value = np.loadtxt(
        'team.csv',
        delimiter=",",
        skiprows = 1,
        dtype = str,
        converters = {k:np.compat.asstr for k in range(3)}
    )
    
burnpanck
  • 1,955
  • 1
  • 12
  • 36
0

You could decode then you can access the keys without using b"5" etc..:

mydict = { k.decode("utf-8"):[v.decode("utf-8"), z.decode("utf-8")] for k,v,z in key_value}


In [46]: mydict
Out[46]: {'5': ['Akron', '875'], '8': ['Alabama', '911'], '9': ['UAB', '24312']}
In [47]: mydict["5"]
Out[47]: ['Akron', '875']

You could just open it normally:

mydict = { }
with open('in.csv')as f:
    next(f) # skip header
    for line in  f:
        k,v,z = line.rstrip().split(",") 
        mydict[k] = [v,z]
print(mydict)
{'9': ['UAB', '24312'], '8': ['Alabama', '911'], '5': ['Akron', '875']}

Or using csv module:

import csv
mydict = { }
with open('out.txt') as f:
    next(f)
    reader = csv.reader(f,delimiter=",")
    for row in reader:
        k,v,z = row
        mydict[k] =  [v,z]
print(mydict)

If you want the numbers as ints use:

mydict[int(k)]=  [v,int(z)]

some timings:

In [39]: %%timeit
   ....: key_value = np.loadtxt(
   ....:     'team.csv',
   ....:     delimiter=",",
   ....:     skiprows = 1,
   ....:     dtype = bytes
   ....: ).astype(str)
   ....: mydict = { k:[v, z] for k,v,z in key_value}
   ....: 

10000 loops, best of 3: 123 µs per loop

In [40]: 

In [40]: %%timeit
   ....: mydict = { }
   ....: with open('team.csv') as f:
   ....:     next(f)
   ....:     reader = csv.reader(f,delimiter=",")
   ....:     for line in reader:
   ....:         k,v,z = line
   ....:         mydict[k]=  [v,z]
   ....: 
10000 loops, best of 3: 42.9 µs per loop

In [42]: %%timeit
mydict = { }
with open('team.csv')as f:
    next(f) # skip header
    for line in  f:
        k,v,z = line.rstrip().split(",") 
        mydict[k] = [v,z]
   ....: 
10000 loops, best of 3: 37.6 µs per loop

Using a file with 150 lines, numpy much less efficient:

In [12]: %%timeit
   ....:  key_value = np.loadtxt(
   ....:  'team.csv',
   ....:  delimiter=",",
   ....:  skiprows = 1,
   ....:  dtype = bytes
   ....:  ).astype(str)
   ....:  mydict = { k:[v, z] for k,v,z in key_value}
   ....: 
100 loops, best of 3: 2.01 ms per loop

In [13]: %%timeit
   ....: mydict = { }
   ....: with open('team.csv')as f:
   ....:     next(f) # skip header
   ....:     for line in  f:
   ....:         k,v,z = line.rstrip().split(",") 
   ....:         mydict[k] = [v,z]
   ....: 
10000 loops, best of 3: 165 µs per loop
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0

I beleive you don't need numpy for this kind of task, because anyway you convert the data to Python dict. You can very easily implement parsing yourself:

with open('team.csv') as f:
    mydict = {k : [v, z] for k, v, z in (line.rstrip().split(',') for line in f)}
Anton Savin
  • 40,838
  • 8
  • 54
  • 90