1

Hi currently trying to read in a very easy csv file that looks like this: enter image description here

i don't need the header, but i want all rows as a k,v in a dictionary but the writer overwrites the data with the last value, how do I keep these duplicates in order to keep all rows as k,v in my dict?

Fisqkuz
  • 99
  • 7
  • 1
    A dictionary can only store one value per key. How do you imagine the result to look like? You could stay with a list of pairs, or a dict from keys to list of values. – phipsgabler Nov 30 '21 at 12:08
  • The ideal solution would look like: {'data': ['plaza', 'plazo', 'plozo'], 'doto': ["plaza"]} – Fisqkuz Nov 30 '21 at 12:23

3 Answers3

1

The convenient way to do it is using a defaultdict from the collections module:

# create some test data
with open("data.txt","w") as d:
    d.write("test,test_one\ndata,plaza\ndata,plazo\ndata,plozo\ndoto,plaza")

then use

from collections import defaultdict

data = defaultdict(list)

with open("data.txt") as f:
    f.readline()  # skip header

    # process remainder
    for line in f:
        line = line.strip() # remove \n
        if line:
            # extract key + value by splitting
            key,value = line.split(',',2)
            # and add it
            data[key].append(value)

print(data)            

# print converted 
print(dict(data.items()))            

Output:

defaultdict(<class 'list'>, {'data': ['plaza', 'plazo', 'plozo'],
                             'doto': ['plaza']})

# converted to normal dict
{'data': ['plaza', 'plazo', 'plozo'], 'doto': ['plaza']}

See How does collections.defaultdict work?

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • Great, made it work this way. Not familiar with the collections lib. – Fisqkuz Nov 30 '21 at 12:46
  • 1
    @Fisq the `defaultdict` is faster then using the normal dict's `dict.setdefault(key, []).append("whatever")` - `defaultdict` is implemented specifically towards this task. Using dict.setdefault(key, []).append(...) would work as well and would not need the import - but for big data files it may be just a tad slower. – Patrick Artner Nov 30 '21 at 12:50
0

As @phipsgabler pointed out, you can only store one identical key in a dictionary.

What you could do instead is save the values in a list like so:

my_dict = { 'data' : ['plaza','plazo', 'plozo'] }'

py_coffee
  • 85
  • 10
  • thanks, didn't realize keys had to be unique, i thought an unique k,v pair would suffice. How would I read in this csv in order to become {'data': ['plaza', 'plazo', 'plozo'], 'doto': ["plaza"]} ? – Fisqkuz Nov 30 '21 at 12:28
  • Does your csv-file look like this? – py_coffee Nov 30 '21 at 14:11
  • `'data, plaza'`
    `data, plazo` etc..
    – py_coffee Nov 30 '21 at 14:12
  • in that case, you could try something like this: `for line in csv_lines: collumns = line.split(',')` `if collumns[0] not in my_dict:` `my_dict[collumns[0]] = collumns[1:]` `else: my_dict[collumns[0]].extend(collumns[1:])` – py_coffee Nov 30 '21 at 14:15
0

To construct a dict of lists for that setting, you can just append to a default empty list:

In [1]: values
Out[1]: ['plaza', 'plazo', 'plozo', 'plaza']

In [2]: keys
Out[2]: ['data', 'data', 'data', 'doto']

In [4]: d = dict()

In [7]: for (k, v) in zip(keys, values):
   ...:     d.setdefault(k, []).append(v)

In [8]: d
Out[8]: {'data': ['plaza', 'plazo', 'plozo'], 'doto': ['plaza']}

Another possibility of representation would just be the list of pairs:

In [10]: list(zip(keys, values))
Out[10]: [('data', 'plaza'), ('data', 'plazo'), ('data', 'plozo'), ('doto', 'plaza')]
phipsgabler
  • 20,535
  • 4
  • 40
  • 60