read csv as dict but keep duplicate keys python

Question

Hi currently trying to read in a very easy csv file that looks like this:

i don't need the header, but i want all rows as a k,v in a dictionary but the writer overwrites the data with the last value, how do I keep these duplicates in order to keep all rows as k,v in my dict?

A dictionary can only store one value per key. How do you imagine the result to look like? You could stay with a list of pairs, or a dict from keys to list of values. — phipsgabler, Nov 30 '21 at 12:08
The ideal solution would look like: {'data': ['plaza', 'plazo', 'plozo'], 'doto': ["plaza"]} — Fisqkuz, Nov 30 '21 at 12:23

score 1 · Accepted Answer · answered Nov 30 '21 at 12:35

The convenient way to do it is using a defaultdict from the collections module:

# create some test data
with open("data.txt","w") as d:
    d.write("test,test_one\ndata,plaza\ndata,plazo\ndata,plozo\ndoto,plaza")

then use

from collections import defaultdict

data = defaultdict(list)

with open("data.txt") as f:
    f.readline()  # skip header

    # process remainder
    for line in f:
        line = line.strip() # remove \n
        if line:
            # extract key + value by splitting
            key,value = line.split(',',2)
            # and add it
            data[key].append(value)

print(data)            

# print converted 
print(dict(data.items()))

Output:

defaultdict(<class 'list'>, {'data': ['plaza', 'plazo', 'plozo'],
                             'doto': ['plaza']})

# converted to normal dict
{'data': ['plaza', 'plazo', 'plozo'], 'doto': ['plaza']}

See How does collections.defaultdict work?

Great, made it work this way. Not familiar with the collections lib. — Fisqkuz, Nov 30 '21 at 12:46
@Fisq the `defaultdict` is faster then using the normal dict's `dict.setdefault(key, []).append("whatever")` - `defaultdict` is implemented specifically towards this task. Using dict.setdefault(key, []).append(...) would work as well and would not need the import - but for big data files it may be just a tad slower. — Patrick Artner, Nov 30 '21 at 12:50

score 0 · Answer 2 · answered Nov 30 '21 at 12:25

0

As @phipsgabler pointed out, you can only store one identical key in a dictionary.

What you could do instead is save the values in a list like so:

my_dict = { 'data' : ['plaza','plazo', 'plozo'] }'

answered Nov 30 '21 at 12:25

py_coffee

85
10

thanks, didn't realize keys had to be unique, i thought an unique k,v pair would suffice. How would I read in this csv in order to become {'data': ['plaza', 'plazo', 'plozo'], 'doto': ["plaza"]} ? – Fisqkuz Nov 30 '21 at 12:28
Does your csv-file look like this? – py_coffee Nov 30 '21 at 14:11
`'data, plaza'`
`data, plazo` etc.. – py_coffee Nov 30 '21 at 14:12
in that case, you could try something like this: `for line in csv_lines: collumns = line.split(',')` `if collumns[0] not in my_dict:` `my_dict[collumns[0]] = collumns[1:]` `else: my_dict[collumns[0]].extend(collumns[1:])` – py_coffee Nov 30 '21 at 14:15

score 0 · Answer 3 · answered Nov 30 '21 at 12:31

To construct a dict of lists for that setting, you can just append to a default empty list:

In [1]: values
Out[1]: ['plaza', 'plazo', 'plozo', 'plaza']

In [2]: keys
Out[2]: ['data', 'data', 'data', 'doto']

In [4]: d = dict()

In [7]: for (k, v) in zip(keys, values):
   ...:     d.setdefault(k, []).append(v)

In [8]: d
Out[8]: {'data': ['plaza', 'plazo', 'plozo'], 'doto': ['plaza']}

Another possibility of representation would just be the list of pairs:

In [10]: list(zip(keys, values))
Out[10]: [('data', 'plaza'), ('data', 'plazo'), ('data', 'plozo'), ('doto', 'plaza')]

read csv as dict but keep duplicate keys python

3 Answers3