1

using Python 3.3.0, I created a "dictionary" from a csv-file (header: ID;Col1;Col2;Col3;Col4;Col5):

ID;Col1;Col2;Col3;Col4;Col5
15345;1;1;nnngngn;vhrhtnz;latest
12345;12;8;gnrghrtthr;tznhltrnhklr;latest
90834;3;4;something;nonsens;latest
12345;34;235;dontcare;muhaha;oldone

with code

file = "test.csv" 
csv_file = csv.DictReader(open(file, 'r'), delimiter=';', quotechar='"')

and I wanted to copy the lines with ID = 12345 into a new dictionary, NOT into a file. I really nedded to copy into a dictionary, NOT a list, because I wanted to be able to address the column names directly. I tried this by doing

cewl = {}
for row in csv_file:
   if row['ID'] == '12345':
   cewl.update(row)
print(cewl)

Output is:

{'ID': '12345', 'Col1': '34', 'Col2': '235', 'Col3': 'dontcare', 'Col4': 'muhaha', 'Col5': 'oldone'}

My problem: Only the second line with ID=12345 gets copied, the first one is omitted, I don't know why.

If I try this by copying into a new list (just for testing purposes), everything works fine:

cewl = []
for row in csv_file1:
if row['ID'] == '12345':
    cewl.append(row)
print(cewl)

Output is :

[{'Col3': 'gnrghrtthr', 'Col2': '8', 'Col1': '12', 'Col5': 'latest', 'Col4': 'tznhltrnhklr', 'ID': '12345'}, 
{'Col3': 'dontcare', 'Col2': '235', 'Col1': '34', 'Col5': 'oldone', 'Col4': 'muhaha', 'ID': '12345'}]

I don't know why this isn't working by copying into the new dictionary...there doesn't seem to be a method like .add or .append for dictreader.

How can I copy my data into a new dictionary without missing any lines ?

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
dacoda
  • 157
  • 1
  • 2
  • 10
  • A dictionary is a mapping; decide if you want an ID ('12345') mapped to two or more different pieces of data as in your example, in which case you can map an ID to a list of dictionaries containing distinct mappings of values for keys `Col1`, `Col2`, etc. OR something like a list of tuples `(ID, Col1, Col2, etc)`. Think your data structures before you write any code. – Michael Foukarakis Feb 12 '13 at 11:42

1 Answers1

2

What is the expected output? The behaviour is perfectly normal for a dict; you are replacing the values for each key with a new value.

If you wanted the values to be lists of the values for each matching row, it's easier to use a defaultdict with a list factory:

from collections import defaultdict

cewl = defaultdict(list)

for row in csv_file:
   if row['ID'] == '12345':
       for k, v in row.items():
           cewl[k].append(v)

print(cewl)

This outputs:

defaultdict(<class 'list'>, {'Col1': ['12', '34'], 'ID': ['12345', '12345'], 'Col2': ['8', '235'], 'Col5': ['latest', 'oldone'], 'Col4': ['tznhltrnhklr', 'muhaha'], 'Col3': ['gnrghrtthr', 'dontcare']})

A defaultdict is a subclass of dict,so print(cewl['Col1']) will print ['12', '34'].

When you use .update() you effectively do this:

for k, v in row.items():
    cewl[k] = v

e.g. set each key in cewl to the value found in the row being processed. When the last row is being processed, it's values overwrite the values of previous rows.

If you want to filter out just the rows that match a certain ID criteria, then adding them to a list is just perfectly fine. You then loop over the matched results to process them:

for row in cewl:
    # do something with matched row

or you can build a generator filter that you wrap around your DictReader() to do the filtering for you, so you don't need to build the list in memory:

def rowfilter(reader, id):
    for row in reader:
        if row['ID'] == id:
            yield row

for row in rowfilter(csv_file, '12345'):
    # do something with matched row
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • @dacoda: A python mapping object maps a key to *one* value. So `somedict['a'] = 1` followed by `someddict['a'] = 2` means that you *replaced* the value for the key `'a'`. My proposed solution gives you a list value, and we add items to that list as we find them. I am not certain that you understand how python mappings work though. – Martijn Pieters Feb 12 '13 at 10:52
  • If you expect a list of dictionaries, then use your list and append rows to that instead. – Martijn Pieters Feb 12 '13 at 10:53
  • @dacoda: I was able to read your comment just fine, please do not edit answers when a comment will do. I want you to give me the *exact* expected output, not what you already posted (because 'something like this but different' is not clear). – Martijn Pieters Feb 12 '13 at 11:01
  • :Thanks, I want to create a copy of my dictionary including only the lines with ID=12345, so the expected output of cewl in this case should be the header and two lines, each with ID 12345 an the corresponding data. – dacoda Feb 12 '13 at 11:03
  • @dacoda: What are you trying to do? Write out a filtered `csv` file? The `DictReader` gives you a *sequence* of `dict` objects, like your list output, each with the same keys for each row. You need to be much clearer in what you are trying to do, I am still just guessing here. – Martijn Pieters Feb 12 '13 at 11:07
  • :I want to select certain rows from a csv-file and copy them into a "temporary construct" I can work with (NOT a file!). Since dictreader can address the column names directly, it seems better to create a dictreader object and work with that.I found lots of tutorials about how to select certain lines of a csv using dictreader and writing them into a new csv-file using dictwriter. BUT: I just wanted to create a new dict, which contains exactly the same data I would get by writing them into a file, WITHOUT writing them into a new file, just create a dict I can continue to work with. – dacoda Feb 12 '13 at 11:15
  • So my basic question is: How can I copy the lines with ID=12345 from one dict into a new one ? How can I "add" each line to get two separate lines/entries? – dacoda Feb 12 '13 at 11:22
  • @dacoda: What does separate lines or entries *mean*? My solution gives you a list per key, and we add the values to each key. That way you get, for each column, a list of values as found in the CSV. What are you trying to *do* after you loaded your CSV file? – Martijn Pieters Feb 12 '13 at 11:24
  • @dacoda: If you want just those lines to work with, then your list is exactly what you are looking for (`[firstmatchinglinedict, secondmatchinglinedict, etc.]`). – Martijn Pieters Feb 12 '13 at 11:26
  • Look: If I print out the rows of my first dict with ID=12345 I get: {'Col4': 'tznhltrnhklr', 'Col5': 'latest', 'Col2': '8', 'Col3': 'gnrghrtthr', 'Col1': '12', 'ID': '12345'} and {'Col4': 'muhaha', 'Col5': 'oldone', 'Col2': '235', 'Col3': 'dontcare', 'Col1': '34', 'ID': '12345'}. And that's exactly what I want in my new dict...sorry, I don't want to be rude, but is this so hard to get ? Maybe I'm explaining it the wrong way. :) I just want a copy of certain rows of one dict copied into another one. :) – dacoda Feb 12 '13 at 11:35
  • @dacoda: You *can't* copy a dict into another dict and expect the old values to remain. I've given you two alternatives; either *combine* the values into lists, or do not copy the dicts in the first place, just add them to a list instead (which is what you were doing to start with). I cannot help you do things that are impossible. – Martijn Pieters Feb 12 '13 at 11:36
  • :Well, I was thinking of a similar solution to copy.deepcopy(), but that one copies the whole dict. Somehow it should be possible to achieve this only for certain rows/lines, but as I follow our discussion, I start thinking that there isn't a way to do that. :) – dacoda Feb 12 '13 at 11:44
  • It is not the copying that is the problem here. It is what you expect the result of the copy to *be*. – Martijn Pieters Feb 12 '13 at 11:45
  • OK, another approach:) : imagine I would just copy my dict and delete all entries in the new one except those with ID=12345. The result would be the same, just another way to get there. My problem is that I can't address column names in lists, but I need those in my new dict to continue working with it, nevermind what I want to do with the new dict. ;) I just want a "partial" copy of my dict, that's it. :) – dacoda Feb 12 '13 at 11:50
  • I think you misunderstand what `DictReader()` *gives* you. It doesn't give you *one* `dict`. It gives you a *series* of `dict` objects. Each one has the exact same keys, but different values. Each one is thus different from the previous one. When you add those to a list, you can still access the keys in that new series of `dict` objects; just loop over that list. – Martijn Pieters Feb 12 '13 at 12:15
  • I think implementing the + operator for dicts might describe more precisely the thing I want to achieve, as mentioned in http://stackoverflow.com/questions/6416131/python-add-new-item-to-dictionary :) ...sounds nice, doesn't help very much. :) – dacoda Feb 12 '13 at 14:38
  • @dacoda: The problem is that the second dict *has the exact same keys*. A dict can only contain *one* copy of each unique key. – Martijn Pieters Feb 12 '13 at 14:44
  • :But what about newdict = olddict.copy() ? Both dicts have the same keys, works like a charm. I would need to have a row.add- or row.append-method for my dict. – dacoda Feb 12 '13 at 15:04
  • As I said, it's not the copying that's the problem. It's when you `update()` another dict with the same keys that you misunderstand what happens. – Martijn Pieters Feb 12 '13 at 15:06