Creating a dictionary from a csv file?

Question

I am trying to create a dictionary from a csv file. The first column of the csv file contains unique keys and the second column contains values. Each row of the csv file represents a unique key, value pair within the dictionary. I tried to use the csv.DictReader and csv.DictWriter classes, but I could only figure out how to generate a new dictionary for each row. I want one dictionary. Here is the code I am trying to use:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
    writer = csv.writer(outfile)
    for rows in reader:
        k = rows[0]
        v = rows[1]
        mydict = {k:v for k, v in rows}
    print(mydict)

When I run the above code I get a ValueError: too many values to unpack (expected 2). How do I create one dictionary from a csv file? Thanks.

Can you give an example of an input file and the resulting data structure? — robert, Jul 19 '11 at 00:13
When you iterate over csv.reader, you get single row, not rows. So, valid form is mydict = {k:v for k,v in reader} but if you are sure, that there are only two columns in the csv file, then mydict = dict(reader) is much faster. — Alex Laskin, Jul 19 '11 at 00:47
Please be aware that storing dictionary / key-value data in CSV files is not without issues (such as dealing with mixed-types columns). **JSON format** could represent this type of data much better IMO. — mirekphd, Aug 12 '22 at 08:38

score 227 · Accepted Answer · edited May 08 '20 at 12:45

227

I believe the syntax you were looking for is as follows:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = {rows[0]:rows[1] for rows in reader}

Alternately, for python <= 2.7.1, you want:

mydict = dict((rows[0],rows[1]) for rows in reader)

edited May 08 '20 at 12:45

michaelbahr

4,837
2
39
75

answered Jul 19 '11 at 00:16

Nate

12,499
5
45
60

2

Good to account for rows longer than expected; but shouldn't he be raising his own exception if there are too many items in a row? I would think that would mean there's an error with his input data. – machine yearning Jul 19 '11 at 01:22
1

And then he'd at least be able to narrow the exception down to faulty input – machine yearning Jul 19 '11 at 01:24
That has some merit, but I'm a firm believer that exceptions are there to tell you that you programmed something incorrectly - not for when the world gives you lemons. That's when you print a pretty error message and fail, or - more appropriate for this case - a pretty warning message and succeed. – Nate Jul 19 '11 at 01:25
Sorry, looked at op's code, hard to tell if he wanted only 2 items per line. I was wrong! – machine yearning Jul 19 '11 at 01:30
3

I had multiple lines in csv but it gave only 1 key:value pair – Abhilash Mishra Jul 31 '19 at 07:02
I've been looking for this all night. I have an API that is using flask/SQLAlchemy and I wanted to mock it with a text file and just use jsonify on the reader output - and this is the magic code, thankyou! – Nick.Mc Jan 13 '22 at 13:17

score 141 · Answer 2 · edited Jan 27 '21 at 11:59

141

Open the file by calling open and then using csv.DictReader.

input_file = csv.DictReader(open("coors.csv"))

You may iterate over the rows of the csv file dict reader object by iterating over input_file.

for row in input_file:
    print(row)

OR To access first line only

dictobj = csv.DictReader(open('coors.csv')).next()

UPDATE In python 3+ versions, this code would change a little:

reader = csv.DictReader(open('coors.csv'))
dictobj = next(reader)

edited Jan 27 '21 at 11:59

grrrrrr

1,395
12
29

answered Jul 14 '16 at 09:31

Laxmikant Ratnaparkhi

4,745
5
33
49

13

This makes DictReader object not a dictionary(and yes not a key value pair) – HN Singh Nov 10 '18 at 17:52
3

@HN Singh - Yeah, I know - intention was it will help some one else as well – Laxmikant Ratnaparkhi Nov 14 '18 at 06:34
1

'DictReader' object has no attribute 'next' – Palak Bansal May 28 '19 at 20:36
3

@Palak - it was answered for Python 2.7, try `next(dictobj)` instead of `dictobj.next()` in Python 3+ versions. – Laxmikant Ratnaparkhi May 29 '19 at 21:52
1

In Python 3+ this also works: `dictobj = reader.__next__()` – Jose R Jun 08 '22 at 03:27
The csv.DictReader Object can be transformed into a full list of all entries with list() – Meiswjn Sep 01 '22 at 14:47
@LaxmikantRatnaparkhi it gives me error, Check my question: https://stackoverflow.com/questions/73624249/using-csv-dicreader-gives-error-valueerror-dictionary-update-sequence-element – Volatil3 Sep 06 '22 at 15:14

robert · Answer 3 · 2011-07-19T00:53:44.847

76

import csv
reader = csv.reader(open('filename.csv', 'r'))
d = {}
for row in reader:
   k, v = row
   d[k] = v

edited Jul 19 '11 at 00:53

answered Jul 19 '11 at 00:15

robert

33,242
8
53
74

61

@Alex Laskin: Really? It looks like some pretty readable python to me. What's your principle to back this statement up? You basically just called him "poopy head"... – machine yearning Jul 19 '11 at 01:17
38

@machine-yearning, no, I didn't say that his code is 'bad'. But there is no a single reason to write `for row in reader: k, v = row` if you can simply write `for k, v in reader`, for example. And if you expect, that reader is an iterable, producing two-element items, then you can simply pass it directly to dict for conversion. `d = dict(reader)` is much shorter and significantly faster on huge datasets. – Alex Laskin Jul 19 '11 at 01:44
59

@Alex Laskin: Thanks for the clarification. I personally agreed with you but I think that if you're gonna call someone's code "non-pythonic" you should accompany that comment with a justification. I'd say that "shorter" and "faster" are not necessarily equivalent to "more pythonic". Readability/reliability is a huge concern as well. If it's easier to work in some of our constraints into the above `for row in reader` paradigm, then it might (after longer-term development) be more practical. I agree with you short-term, but beware of premature optimization. – machine yearning Jul 19 '11 at 05:32
2

@robert : Thanks dude! Really helped. Other codes are too difficult to read. – Ash Oct 22 '20 at 17:13

mudassirkhan19 · Answer 4 · 2018-02-02T04:43:31.823

49

This isn't elegant but a one line solution using pandas.

import pandas as pd
pd.read_csv('coors.csv', header=None, index_col=0, squeeze=True).to_dict()

If you want to specify dtype for your index (it can't be specified in read_csv if you use the index_col argument because of a bug):

import pandas as pd
pd.read_csv('coors.csv', header=None, dtype={0: str}).set_index(0).squeeze().to_dict()

edited Feb 02 '18 at 04:43

answered Dec 06 '17 at 06:31

mudassirkhan19

662
6
10

6

in my book this is the best answer – boardtc Apr 12 '19 at 22:09
1

And if there is a header...? – ndtreviv May 30 '19 at 09:59
2

@ndtreviv you can use skiprows for ignoring headers. – mudassirkhan19 Jun 12 '19 at 07:30

Alex Laskin · Answer 5 · 2011-07-19T01:54:38.183

22

You have to just convert csv.reader to dict:

~ >> cat > 1.csv
key1, value1
key2, value2
key2, value22
key3, value3

~ >> cat > d.py
import csv
with open('1.csv') as f:
    d = dict(filter(None, csv.reader(f)))

print(d)

~ >> python d.py
{'key3': ' value3', 'key2': ' value22', 'key1': ' value1'}

edited Jul 19 '11 at 01:54

answered Jul 19 '11 at 00:41

Alex Laskin

1,127
5
18

9

that solution is tidy, and will work great if he can be **sure** that his inputs will never have three or more columns in some row. However, if that is ever encountered, an exception somewhat like this will be raised: `ValueError: dictionary update sequence element #2 has length 3; 2 is required`. – Nate Jul 19 '11 at 01:17
@machine, judging from the error in the question, the csv file has more than 2 columns – John La Rooy Jul 19 '11 at 01:22
@gnibbler, no, error in the question is due to double unpacking of row. First he try to iterate over reader, obtaining *rows* which is actually single *row*. And when he try to iterate over this single row, he get two items, which can't be unpacked correctly. – Alex Laskin Jul 19 '11 at 01:51
A general comment: making objects held in memory from iterables can cause a memory problem. Suggest checking your memory space and the size of the iterable source file. A main advantage (the whole point?) of iterables is to not hold large things in memory. – travelingbones Mar 04 '16 at 19:29
@Nate: That can be fixed if necessary by wrapping the `filter` call with `map(operator.itemgetter(slice(2)), ...)`, so it will only pull the first two iterms, making it: `dict(map(operator.itemgetter(slice(2)), filter(None, csv.reader(f))))`. If it's Python 2, make sure to do `from future_builtins import map, filter`, so the `dict` reads a generator directly, instead of producing multiple unnecessary temporary `list`s first). – ShadowRanger Jun 08 '16 at 19:55
This is very crisp! Thank you @Alex Laskin – amc Jul 23 '20 at 16:25

conmak · Answer 6 · 2021-06-15T16:44:04.710

16

Assuming you have a CSV of this structure:

"a","b"
1,2
3,4
5,6

And you want the output to be:

[{'a': '1', ' "b"': '2'}, {'a': '3', ' "b"': '4'}, {'a': '5', ' "b"': '6'}]

A zip function (not yet mentioned) is simple and quite helpful.

def read_csv(filename):
    with open(filename) as f:
        file_data=csv.reader(f)
        headers=next(file_data)
        return [dict(zip(headers,i)) for i in file_data]

If you prefer pandas, it can also do this quite nicely:

import pandas as pd
def read_csv(filename):
    return pd.read_csv(filename).to_dict('records')

edited Jun 15 '21 at 16:44

answered Sep 30 '20 at 14:39

conmak

1,200
10
13

2

It worked for my use-case. – user3928562 Nov 28 '22 at 18:17

score 14 · Answer 7 · answered Sep 23 '13 at 10:33

14

You can also use numpy for this.

from numpy import loadtxt
key_value = loadtxt("filename.csv", delimiter=",")
mydict = { k:v for k,v in key_value }

answered Sep 23 '13 at 10:33

Thiru

3,293
7
35
52

1

Note this would work only for numerical columns. For non-numerical you get `ValueError: could not convert string to float: 'Name'`. – mirekphd Mar 18 '22 at 09:47

score 11 · Answer 8 · answered Jan 04 '18 at 19:42

11

One-liner solution

import pandas as pd

dict = {row[0] : row[1] for _, row in pd.read_csv("file.csv").iterrows()}

answered Jan 04 '18 at 19:42

Trideep Rath

3,623
1
25
14

Caution: this overshadows the built-in `dict` object (you won't be able to use it anymore:) – mirekphd Mar 18 '22 at 09:53

score 8 · Answer 9 · answered Jul 17 '19 at 06:21

8

For simple csv files, such as the following

id,col1,col2,col3
row1,r1c1,r1c2,r1c3
row2,r2c1,r2c2,r2c3
row3,r3c1,r3c2,r3c3
row4,r4c1,r4c2,r4c3

You can convert it to a Python dictionary using only built-ins

with open(csv_file) as f:
    csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]

(_, *header), *data = csv_list
csv_dict = {}
for row in data:
    key, *values = row   
    csv_dict[key] = {key: value for key, value in zip(header, values)}

This should yield the following dictionary

{'row1': {'col1': 'r1c1', 'col2': 'r1c2', 'col3': 'r1c3'},
 'row2': {'col1': 'r2c1', 'col2': 'r2c2', 'col3': 'r2c3'},
 'row3': {'col1': 'r3c1', 'col2': 'r3c2', 'col3': 'r3c3'},
 'row4': {'col1': 'r4c1', 'col2': 'r4c2', 'col3': 'r4c3'}}

Note: Python dictionaries have unique keys, so if your csv file has duplicate ids you should append each row to a list.

for row in data:
    key, *values = row

    if key not in csv_dict:
            csv_dict[key] = []

    csv_dict[key].append({key: value for key, value in zip(header, values)})

answered Jul 17 '19 at 06:21

fabda01

3,384
2
31
37

n.b. this can all be shortened to using `set_default`: csv_dict.set_default(key, []).append({key: value for key, value in zip(header, values)})) – mdmjsh Nov 29 '19 at 13:46
1

The ({key: value}) syntax in your `.append` command was very useful. I ended up using the same syntax in a `row.update` when iterating over and adding to a `DictReader`object that was made from a CSV file. – Shrout1 Jun 12 '20 at 12:53
@mdmjsh What is _this_? Also, no such command as `set_default`. – flywire Aug 19 '23 at 22:26
that was a typo, it should have been [setdefault](https://docs.python.org/3/library/stdtypes.html#dict.setdefault) - it doesn't change the above correct answer, it just means that the 'if key not in csv_dict...' logic can be excluded. I use setdefault a lot when dynamically building dicts. – mdmjsh Aug 24 '23 at 16:00

John La Rooy · Answer 10 · 2011-07-19T01:32:45.940

5

I'd suggest adding if rows in case there is an empty line at the end of the file

import csv
with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = dict(row[:2] for row in reader if row)

edited Jul 19 '11 at 01:32

answered Jul 19 '11 at 01:21

John La Rooy

295,403
53
369
502

Both well-done and well-thought-out. But like I said above, should he really be ignoring the fact that his input line is longer than he expected? I'd say he should raise his own exception (with a custom message) if he gets a line with more than two items. – machine yearning Jul 19 '11 at 01:27
Or rather, as stated above by @Nate, at least print a warning message. This just doesn't seem like something you'd want to ignore. – machine yearning Jul 19 '11 at 01:29
your answer (vs. mine) made ponder something - is there an efficiency difference between slicing and indexing in this case? – Nate Jul 19 '11 at 01:29
1

@machine, no idea. Perhaps it's a dump of a user table from a database, and he just wants a dict of userid:username or something for example – John La Rooy Jul 19 '11 at 01:30
@Nate, I'd expect the tuple (your way) to be slightly faster – John La Rooy Jul 19 '11 at 01:35
1

Hey guys, thanks for the comments. Your discussion really helped me out with my problem. I like the idea about about raising a flag if the input is longer than expected. My data is a database dump and I do have more than two columns of data. – drbunsen Jul 19 '11 at 01:48

score 3 · Answer 11 · answered Mar 06 '14 at 06:11

3

If you are OK with using the numpy package, then you can do something like the following:

import numpy as np

lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
   my_dict[lines[i][0]] = lines[i][1]

answered Mar 06 '14 at 06:11

cloudyBlues

75
1
4

I think you should change `dtype=str` because for `None` one gets bytes in as both keys and values. – mirekphd Mar 18 '22 at 09:51

score 3 · Answer 12 · answered Sep 14 '19 at 19:12

with pandas, it is much easier, for example. assuming you have the following data as CSV and let's call it test.txt / test.csv (you know CSV is a sort of text file )

a,b,c,d
1,2,3,4
5,6,7,8

now using pandas

import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()

for each row, it would be

df.to_dict(orient='records')

and that's it.

score 2 · Answer 13 · answered Feb 10 '15 at 18:43

You can use this, it is pretty cool:

import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
      records, metadata = commas.parse(f)
      for row in records:
            print 'this is row in dictionary:'+rowenter code here

score 2 · Answer 14 · answered Jul 08 '19 at 10:59

Many solutions have been posted and I'd like to contribute with mine, which works for a different number of columns in the CSV file. It creates a dictionary with one key per column, and the value for each key is a list with the elements in such column.

    input_file = csv.DictReader(open(path_to_csv_file))
    csv_dict = {elem: [] for elem in input_file.fieldnames}
    for row in input_file:
        for key in csv_dict.keys():
            csv_dict[key].append(row[key])

You transposed the dictionaries, see https://stackoverflow.com/a/57069644/4539999 — flywire, Aug 19 '23 at 14:03

score 1 · Answer 15 · answered Jun 18 '19 at 01:09

Try to use a defaultdict and DictReader.

import csv
from collections import defaultdict
my_dict = defaultdict(list)

with open('filename.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        for key, value in line.items():
            my_dict[key].append(value)

It returns:

{'key1':[value_1, value_2, value_3], 'key2': [value_a, value_b, value_c], 'Key3':[value_x, Value_y, Value_z]}

score 1 · Answer 16 · answered Jul 15 '22 at 07:41

1

here is an approach for CSV to Dict:

import pandas

data = pandas.read_csv('coors.csv')

the_dictionary_name = {row.k: row.v for (index, row) in data.iterrows()}

answered Jul 15 '22 at 07:41

Ion Harin

31
4

score 0 · Answer 17 · answered Sep 22 '20 at 21:40

If you have:

Only 1 key and 1 value as key,value in your csv
Do not want to import other packages
Want to create a dict in one shot

Do this:

mydict = {y[0]: y[1] for y in [x.split(",") for x in open('file.csv').read().split('\n') if x]}

What does it do?

It uses list comprehension to split lines and the last "if x" is used to ignore blank line (usually at the end) which is then unpacked into a dict using dictionary comprehension.

Creating a dictionary from a csv file?

17 Answers17

What does it do?

Linked

Related