6

I am new to Python and tried all things I could think of and could not find a solution to this. I have a list that contains as the last of its items one dictionary, with different number of keys, that looks like.

l = [('Apple', 1, 2, {'gala': (2, 1.0)}), 
('Grape ', 2, 4, {'malbec': (4, 0.25), 'merlot': (4, 0.75)}), 
('Pear', 4, 5, {'anjou': (5, 0.2), 'bartlet': (5, 0.4), 'seckel': (5, 0.2)}), 
('Berry', 5, 5, {'blueberry': (5, 0.2), 'blackberry': (5, 0.2), 'straw': (5, 0.2)})]

When I try to write a .csv file from the current list, I used:

test_file = ()
length = len(l[0])

with open('test1.csv', 'w', encoding = 'utf-8') as test_file:
    csv_writer = csv.writer(test_file, delimiter=',')
    for y in range(length):
        csv_writer.writerow([x[y] for x in l])

It makes the last element on the list, the dictionary, to be only one string in the output file:

Apple   1   2   {'gala': (2, 1.0)}
Grape   2   4   {'malbec': (4, 0.25), 'merlot': (4, 0.75)}
Pear    4   5   {'anjou': (5, 0.2), 'bartlet': (5, 0.4), 'seckel': (5, 0.2), 'bosc': (5, 0.2)}
Berry   5   5   {'blueberry': (5, 0.2), 'blackberry': (5, 0.2), 'straw': (5, 0.2)}

Which renders impossible to to any operations with the values inside the last item.

I tried to flatten the nested dictionary so I would get just a plain list, but the outcome does not preserve the relationship between items. What I need is to split the dictionary and have an output that would look somewhat like this:

Apple   1   2   gala        2   1.0
Grape   2   4   malbec  4   0.25
            merlot      4   0.75
Pear    4   5   anjou       5   0.2
            bartlet     5   0.4
            seckel      5   0.2
            bosc        5   0.2
Berry   5   5   blueberry   5   0.2
            blackberry  5   0.2
            straw       5   0.2

I mean somewhat like this because I am not committed to this format, but to the idea that the hierarchical relation of the dictionary will not be lost in the output file. Is there a way to do it? I am really new to python and appreciate any help. Thanks!

user2962024
  • 317
  • 3
  • 7
  • 14
  • 1
    You're mixing apples and oranges here. There are ways to represent a tree in a flat format like csv, but unless you have a very good reason to stick to csv, you should be saving it in a format that allows you to save the dicts. A JSON file might be an option. – Pedro Werneck Nov 11 '13 at 15:29

4 Answers4

1

Assuming you must store it in a CSV with one row per item in the dict, the following shows how you might write and read it. This is not efficient nor optimal if you have a large set of data, since it repeats data in each row, however it will compress very well.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""csv_dict.py
"""
import csv
import pprint
from collections import namedtuple


Row = namedtuple('Row', [
    'name',
    'value_1',
    'value_2',
    'extra_name',
    'extra_value_1',
    'extra_value_2'
])


l = [
    ('Apple', 1, 2, {'gala': (2, 1.0)}),
    ('Grape ', 2, 4, {'malbec': (4, 0.25), 'merlot': (4, 0.75)}),
    ('Pear', 4, 5, {
        'anjou': (5, 0.2),
        'bartlet': (5, 0.4),
        'seckel': (5, 0.2)}
    ),
    ('Berry', 5, 5, {
        'blueberry': (5, 0.2),
        'blackberry': (5, 0.2),
        'straw': (5, 0.2)
    })
]

print('List before writing: ')
pprint.pprint(l)

# Writing the data.
with open('test1.csv', 'wb') as fout:
    writer = csv.writer(fout)

    for row in l:
        for k, v in row[3].iteritems():
            writer.writerow(row[0:3] + (k,) + v)

# Reading the data.
format_extra = lambda row: (int(row.extra_value_1), float(row.extra_value_2))

with open('test1.csv', 'rU') as fin:
    reader = csv.reader(fin)

    ll = []
    hl = {}

    for row in (Row(*r) for r in reader):
        if row.name in hl:
            ll[hl[row.name]][3][row.extra_name] = format_extra(row)
            continue

        ll.append(row[0:3] + ({
            row.extra_name: format_extra(row)
        },))
        hl[row.name] = len(ll) - 1

    pprint.pprint(ll)
TkTech
  • 4,729
  • 1
  • 24
  • 32
  • Thank you for your time trying to help me out. I like your solution but it is not working for me because I get a AttributeError: 'dict' object has no attribute 'iterates' after: for row in l: for k, v in row[3].iteritems(): What do you think could be causing this? – user2962024 Nov 11 '13 at 21:04
  • @user2962024 Did you typo "iteritems"? Are you running on Python 3? In py3k, `iteritems()` has been replaced by `items()`. I can confirm my example above runs on py2.7. – TkTech Nov 11 '13 at 21:13
  • I am using Python 3.2 and I replaced the `iteritems()` by `items()`, I should have paid more attention. But now the I get `TypeError: 'str' does not support the buffer interface` at the same point. – user2962024 Nov 11 '13 at 21:24
  • 1
    @user2962024 Changing `with open('test1.csv', 'wb') as fout` to `with open('test1.csv', 'w', newline='') as fout` should fix that. If you can, please mention the version of python you're using in your post, or as a tag in the future. Makes it easier. – TkTech Nov 11 '13 at 21:33
  • I understand that the problem has to do with python 3 not allowing to serialize a 'string' to bytes without explict conversion to some encoding. I tried using `encode = 'utf-8'`, but it is not applicable to a list. Any suggestions on how to proceed? – user2962024 Nov 11 '13 at 21:40
  • @user2962024 If you make the change I mentioned a comment above yours, it will work on Python 3. – TkTech Nov 11 '13 at 21:41
  • Thanks, I haven't seen your comment, maybe we commented at the same time. I have made the change you suggested and now I get an error saying that `Row` in the line: `for row in (Row(*r) for r in reader):` is not defined. I thought it maybe a typo and tried `row`. With this I get an error message saying the `tuples are not callable`, although, as far as I know `l` is a list. – user2962024 Nov 11 '13 at 22:19
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/40984/discussion-between-user2962024-and-tktech) – user2962024 Nov 11 '13 at 22:24
0

Seems like you're pretty close. A few points -- you don't need to initialize test_file, and you can put length in the iterator.

If I was writing this to csv, I would probably use

with open('test1.csv', 'w', encoding = 'utf-8') as test_file:
  for row in l:
    species_data = row[:3]
    for subspecies, subspecies_data in row[4].iter_items():
      write_row = species_data + [subspecies] + list(subspecies_data)
      test_file.write(','.join([str(j) for j in write_row]))

Certainly there are optimizations you could make if it was a big list, or if you were very concerned about repeating information.

colcarroll
  • 3,632
  • 17
  • 25
0

Here is a quick function that I modified to take a list, tuple or dict and flatten it. It will flatten all nested parts.

I modified your code and tested in python 2.7. This should generate the output you are looking for:

def flatten(l):
'''
flattens a list, dict or tuple
'''
    ret = []
    for i in l:
        if isinstance(i, list) or isinstance(i, tuple):
            ret.extend(flatten(i))
        elif isinstance(i, dict):
            ret.extend(flatten(i.items()))
        else:
            ret.append(i)
    return ret

l = [('Apple', 1, 2, {'gala': (2, 1.0)}), 
('Grape ', 2, 4, {'malbec': (4, 0.25), 'merlot': (4, 0.75)}), 
('Pear', 4, 5, {'anjou': (5, 0.2), 'bartlet': (5, 0.4), 'seckel': (5, 0.2)}), 
('Berry', 5, 5, {'blueberry': (5, 0.2), 'blackberry': (5, 0.2), 'straw': (5, 0.2)})]

test_file = ()
length = len(l[0])

with open('test1.csv', 'wb') as test_file:
    csv_writer = csv.writer(test_file, delimiter=',')
    for y in range(length):
        line = flatten(l[y])
        csv_writer.writerow([x for x in line])
e h
  • 8,435
  • 7
  • 40
  • 58
  • Thank you for your suggestion and time. I keep getting an error: TypeError: 'str' does not support the buffer interface. Any idea why? – user2962024 Nov 11 '13 at 21:14
  • In the very last line: `csv_writer.writerow([x for x in line])`, maybe this has to do with encoding 'utf-8' in Python 3. But I could not solve it by myself. Any ideas? Thanks again. – user2962024 Nov 11 '13 at 21:47
  • I wrote the ints as ints, not as strings. If you change the last line to `csv_writer.writerow([str(x) for x in line])` does it work? If you want the string in UTF-8 you should be able to change it to `[str(x).encode('utf-8')` – e h Nov 11 '13 at 21:54
  • @emf. Unfortunetaly, not. same error: `TypeError: 'str' does not support the buffer interface`. – user2962024 Nov 11 '13 at 21:59
  • Sounds like a Python 3 issue. Check this question, it might be of help: http://stackoverflow.com/questions/5471158/typeerror-str-does-not-support-the-buffer-interface – e h Nov 11 '13 at 22:03
0

If you insist on CSV/TSV, you should keep in mind that it is a representation of table, but you expect it to look like a structured file (XML/JSON/YAML). I'd recommend using CSV/TSV to store data as relational tables, otherwise you can get into a bit of messy outputs. In your case, an option to choose for would be output like this:

headers:

SuperSpecieName,SpecieName,Value1,Value2

data:

"",Apple,1,2
Apple,gala,2,1.0
"",Grape,2,4
Grape,malbec,4,0.25
Grape,merlot,4,0.75
...
Lubos
  • 26
  • 2