Python CSV proplem

Question

For the first time I am having a proble loading a csv into Python.

I am trying to do this. My csv file is identical to his, but longer and with different values.

When I run this,

import collections
path='../data/struc.csv'
answer = collections.defaultdict(list)
with open(path, 'r+') as istream:
    for line in istream:
        line = line.strip()
        try:
            k, v = line.split(',', 1)
            answer[k.strip()].append(v.strip())
        except ValueError:
            print('Ignoring: malformed line: "{}"'.format(line))

print(answer)

Everything runs fine. I get exactly what you would expect.

With out copy and pasting the code from the link, in both instances I get an error.

In the accepted answer, the terminal spits back ValueError: need more than 1 value to unpack

In the second answer, I get AttributeError: 'file' object has no attribute 'split'. It also does not work if you adjust it to take a list.

I feel like the problem is the csv file itself. The head of it is

_id,parent,name,\n Section,none,America's,\n Section,none,Europe,\n Section,none,Asia,\n Section,none,Africa,\n Country,America's,United States,\n Country,America's,Argentina,\n Country,America's,Bahamas,\n Country,America's,Bolivia,\n Country,America's,Brazil,\n Country,America's,Colombia,\n Country,America's,Canada,\n Country,America's,Cayman Islands,\n Country,America's,Chile,\n Country,America's,Costa Rica,\n Country,America's,Dominican Republic,\n I have read a lot of stuff about csv's, tried the import csv stuff, and still no luck. Please someone help. Having this kind of problem is the worst.

import re
from collections import defaultdict

parents=defaultdict(list)
path='../data/struc.csv'

with open(path, 'r+') as istream:
    for i, line in enumerate(istream.split(',')):
        if i != 0 and line.strip():
            id_, parent, name = re.findall(r"[\d\w-]+", line)
            parents[parent].append((id_, name))



Traceback (most recent call last):

  File "<ipython-input-29-2b2fd98946b3>", line 1, in <module>
runfile('/home/bob/Documents/mega/tree/python/structure.py',       wdir='/home/bob/Documents/mega/tree/python')

  File "/home/bob/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 685, in runfile
    execfile(filename, namespace)

   File "/home/bob/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 78, in execfile
    builtins.execfile(filename, *where)

  File "/home/bob/Documents/mega/tree/python/structure.py", line 15, in <module>
    for i, line in enumerate(istream.split(',')):

AttributeError: 'file' object has no attribute 'split'

Can you show us exactly the code you are running, and the full stack trace of the error? — Tom Dalton, Nov 07 '15 at 15:24
For one thing, you're splitting on `,` but there aren't any in the CSV file. — Kenney, Nov 07 '15 at 15:26
@Kenney, thanks for your response. It is comma separated, it is just that I copied and pasted from liboffice — lost, Nov 07 '15 at 15:31
@TomDalton Thanks for your response. I just posted the second from the link, I have tried a number of combinations of getting the file in, this is just the most recent. All the errors come back on the same line. — lost, Nov 07 '15 at 15:32
also, in the code above,` ','` has been tried out of desperation. I used `'\n' `as well. — lost, Nov 07 '15 at 15:35
This is probably TSV (tab-separated) flavour of CSV. BTW, what is wrong using cvs module for the task? — Roman Susi, Nov 07 '15 at 15:37
@RomanSusi thanks for responding. It is `'/n'`. I tried import csv but it failed, can you tell me how you would use it? — lost, Nov 07 '15 at 15:40
@RomanSusi you are right it is TSV version. I got the /n by adding another row with /n values. Do you think I should add /n to the name column string so it has /n in it, or is there a way to load it using TVS — lost, Nov 07 '15 at 15:43

Roman Susi · Accepted Answer · 2015-11-07T15:51:21.817

First of all, Python has a special module in it's standard library for dealing with CSV of different flavours. Refer to documentation.

When CSV file has headers, csv.DictReader is probably more intuitive way to parse the file:

import collections
import csv

filepath = '../data/struc.csv'
answer = collections.defaultdict(list)

with open(filepath) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        answer[row["_id"].strip()].append(row["parent"].strip())

print(answer)

You can refer to the field in the row by their names in the header. Here I assumed you would like to use _id and parent, but you got the idea.

Also, dialect=csv.excel_tab can be added as a parameter to DictReader to parse tab-separated files.

Thank you I will read all the documentation for csv. The output is what I needed. It is people like you that make programming possible for mourns like me. — lost, Nov 07 '15 at 15:52

score 0 · Answer 2 · answered Nov 07 '15 at 15:43

0

If you plan on doing any analysis on this data, then I would suggest learning the pandas library. Pandas library takes care of all the details that seem to be tripping you up, making opening a csv file a one-liner.

import pandas as pd
csv_file = pd.read_csv(file_path)

answered Nov 07 '15 at 15:43

jeff_carter

153
5

That is how I usually load it, which makes everything easier. I am not doing analysis on it though, I am trying to make a json tree – lost Nov 07 '15 at 15:45

Python CSV proplem

2 Answers2