225

I am trying to create a dictionary from a csv file. The first column of the csv file contains unique keys and the second column contains values. Each row of the csv file represents a unique key, value pair within the dictionary. I tried to use the csv.DictReader and csv.DictWriter classes, but I could only figure out how to generate a new dictionary for each row. I want one dictionary. Here is the code I am trying to use:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
    writer = csv.writer(outfile)
    for rows in reader:
        k = rows[0]
        v = rows[1]
        mydict = {k:v for k, v in rows}
    print(mydict)

When I run the above code I get a ValueError: too many values to unpack (expected 2). How do I create one dictionary from a csv file? Thanks.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
drbunsen
  • 10,139
  • 21
  • 66
  • 94
  • 3
    Can you give an example of an input file and the resulting data structure? – robert Jul 19 '11 at 00:13
  • 1
    When you iterate over csv.reader, you get single row, not rows. So, valid form is mydict = {k:v for k,v in reader} but if you are sure, that there are only two columns in the csv file, then mydict = dict(reader) is much faster. – Alex Laskin Jul 19 '11 at 00:47
  • Please be aware that storing dictionary / key-value data in CSV files is not without issues (such as dealing with mixed-types columns). **JSON format** could represent this type of data much better IMO. – mirekphd Aug 12 '22 at 08:38

17 Answers17

227

I believe the syntax you were looking for is as follows:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = {rows[0]:rows[1] for rows in reader}

Alternately, for python <= 2.7.1, you want:

mydict = dict((rows[0],rows[1]) for rows in reader)
michaelbahr
  • 4,837
  • 2
  • 39
  • 75
Nate
  • 12,499
  • 5
  • 45
  • 60
  • 2
    Good to account for rows longer than expected; but shouldn't he be raising his own exception if there are too many items in a row? I would think that would mean there's an error with his input data. – machine yearning Jul 19 '11 at 01:22
  • 1
    And then he'd at least be able to narrow the exception down to faulty input – machine yearning Jul 19 '11 at 01:24
  • That has some merit, but I'm a firm believer that exceptions are there to tell you that you programmed something incorrectly - not for when the world gives you lemons. That's when you print a pretty error message and fail, or - more appropriate for this case - a pretty warning message and succeed. – Nate Jul 19 '11 at 01:25
  • Sorry, looked at op's code, hard to tell if he wanted only 2 items per line. I was wrong! – machine yearning Jul 19 '11 at 01:30
  • 3
    I had multiple lines in csv but it gave only 1 key:value pair – Abhilash Mishra Jul 31 '19 at 07:02
  • I've been looking for this all night. I have an API that is using flask/SQLAlchemy and I wanted to mock it with a text file and just use jsonify on the reader output - and this is the magic code, thankyou! – Nick.Mc Jan 13 '22 at 13:17
141

Open the file by calling open and then using csv.DictReader.

input_file = csv.DictReader(open("coors.csv"))

You may iterate over the rows of the csv file dict reader object by iterating over input_file.

for row in input_file:
    print(row)

OR To access first line only

dictobj = csv.DictReader(open('coors.csv')).next() 

UPDATE In python 3+ versions, this code would change a little:

reader = csv.DictReader(open('coors.csv'))
dictobj = next(reader) 
grrrrrr
  • 1,395
  • 12
  • 29
Laxmikant Ratnaparkhi
  • 4,745
  • 5
  • 33
  • 49
76
import csv
reader = csv.reader(open('filename.csv', 'r'))
d = {}
for row in reader:
   k, v = row
   d[k] = v
robert
  • 33,242
  • 8
  • 53
  • 74
  • 61
    @Alex Laskin: Really? It looks like some pretty readable python to me. What's your principle to back this statement up? You basically just called him "poopy head"... – machine yearning Jul 19 '11 at 01:17
  • 38
    @machine-yearning, no, I didn't say that his code is 'bad'. But there is no a single reason to write `for row in reader: k, v = row` if you can simply write `for k, v in reader`, for example. And if you expect, that reader is an iterable, producing two-element items, then you can simply pass it directly to dict for conversion. `d = dict(reader)` is much shorter and significantly faster on huge datasets. – Alex Laskin Jul 19 '11 at 01:44
  • 59
    @Alex Laskin: Thanks for the clarification. I personally agreed with you but I think that if you're gonna call someone's code "non-pythonic" you should accompany that comment with a justification. I'd say that "shorter" and "faster" are not necessarily equivalent to "more pythonic". Readability/reliability is a huge concern as well. If it's easier to work in some of our constraints into the above `for row in reader` paradigm, then it might (after longer-term development) be more practical. I agree with you short-term, but beware of premature optimization. – machine yearning Jul 19 '11 at 05:32
  • 2
    @robert : Thanks dude! Really helped. Other codes are too difficult to read. – Ash Oct 22 '20 at 17:13
49

This isn't elegant but a one line solution using pandas.

import pandas as pd
pd.read_csv('coors.csv', header=None, index_col=0, squeeze=True).to_dict()

If you want to specify dtype for your index (it can't be specified in read_csv if you use the index_col argument because of a bug):

import pandas as pd
pd.read_csv('coors.csv', header=None, dtype={0: str}).set_index(0).squeeze().to_dict()
mudassirkhan19
  • 662
  • 6
  • 10
22

You have to just convert csv.reader to dict:

~ >> cat > 1.csv
key1, value1
key2, value2
key2, value22
key3, value3

~ >> cat > d.py
import csv
with open('1.csv') as f:
    d = dict(filter(None, csv.reader(f)))

print(d)

~ >> python d.py
{'key3': ' value3', 'key2': ' value22', 'key1': ' value1'}
Alex Laskin
  • 1,127
  • 5
  • 18
  • 9
    that solution is tidy, and will work great if he can be **sure** that his inputs will never have three or more columns in some row. However, if that is ever encountered, an exception somewhat like this will be raised: `ValueError: dictionary update sequence element #2 has length 3; 2 is required`. – Nate Jul 19 '11 at 01:17
  • @machine, judging from the error in the question, the csv file has more than 2 columns – John La Rooy Jul 19 '11 at 01:22
  • @gnibbler, no, error in the question is due to double unpacking of row. First he try to iterate over reader, obtaining *rows* which is actually single *row*. And when he try to iterate over this single row, he get two items, which can't be unpacked correctly. – Alex Laskin Jul 19 '11 at 01:51
  • A general comment: making objects held in memory from iterables can cause a memory problem. Suggest checking your memory space and the size of the iterable source file. A main advantage (the whole point?) of iterables is to not hold large things in memory. – travelingbones Mar 04 '16 at 19:29
  • @Nate: That can be fixed if necessary by wrapping the `filter` call with `map(operator.itemgetter(slice(2)), ...)`, so it will only pull the first two iterms, making it: `dict(map(operator.itemgetter(slice(2)), filter(None, csv.reader(f))))`. If it's Python 2, make sure to do `from future_builtins import map, filter`, so the `dict` reads a generator directly, instead of producing multiple unnecessary temporary `list`s first). – ShadowRanger Jun 08 '16 at 19:55
  • This is very crisp! Thank you @Alex Laskin – amc Jul 23 '20 at 16:25
16

Assuming you have a CSV of this structure:

"a","b"
1,2
3,4
5,6

And you want the output to be:

[{'a': '1', ' "b"': '2'}, {'a': '3', ' "b"': '4'}, {'a': '5', ' "b"': '6'}]

A zip function (not yet mentioned) is simple and quite helpful.

def read_csv(filename):
    with open(filename) as f:
        file_data=csv.reader(f)
        headers=next(file_data)
        return [dict(zip(headers,i)) for i in file_data]

If you prefer pandas, it can also do this quite nicely:

import pandas as pd
def read_csv(filename):
    return pd.read_csv(filename).to_dict('records')
conmak
  • 1,200
  • 10
  • 13
14

You can also use numpy for this.

from numpy import loadtxt
key_value = loadtxt("filename.csv", delimiter=",")
mydict = { k:v for k,v in key_value }
Thiru
  • 3,293
  • 7
  • 35
  • 52
  • 1
    Note this would work only for numerical columns. For non-numerical you get `ValueError: could not convert string to float: 'Name'`. – mirekphd Mar 18 '22 at 09:47
11

One-liner solution

import pandas as pd

dict = {row[0] : row[1] for _, row in pd.read_csv("file.csv").iterrows()}
Trideep Rath
  • 3,623
  • 1
  • 25
  • 14
8

For simple csv files, such as the following

id,col1,col2,col3
row1,r1c1,r1c2,r1c3
row2,r2c1,r2c2,r2c3
row3,r3c1,r3c2,r3c3
row4,r4c1,r4c2,r4c3

You can convert it to a Python dictionary using only built-ins

with open(csv_file) as f:
    csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]

(_, *header), *data = csv_list
csv_dict = {}
for row in data:
    key, *values = row   
    csv_dict[key] = {key: value for key, value in zip(header, values)}

This should yield the following dictionary

{'row1': {'col1': 'r1c1', 'col2': 'r1c2', 'col3': 'r1c3'},
 'row2': {'col1': 'r2c1', 'col2': 'r2c2', 'col3': 'r2c3'},
 'row3': {'col1': 'r3c1', 'col2': 'r3c2', 'col3': 'r3c3'},
 'row4': {'col1': 'r4c1', 'col2': 'r4c2', 'col3': 'r4c3'}}

Note: Python dictionaries have unique keys, so if your csv file has duplicate ids you should append each row to a list.

for row in data:
    key, *values = row

    if key not in csv_dict:
            csv_dict[key] = []

    csv_dict[key].append({key: value for key, value in zip(header, values)})
fabda01
  • 3,384
  • 2
  • 31
  • 37
  • n.b. this can all be shortened to using `set_default`: csv_dict.set_default(key, []).append({key: value for key, value in zip(header, values)})) – mdmjsh Nov 29 '19 at 13:46
  • 1
    The ({key: value}) syntax in your `.append` command was very useful. I ended up using the same syntax in a `row.update` when iterating over and adding to a `DictReader`object that was made from a CSV file. – Shrout1 Jun 12 '20 at 12:53
  • @mdmjsh What is _this_? Also, no such command as `set_default`. – flywire Aug 19 '23 at 22:26
  • that was a typo, it should have been [setdefault](https://docs.python.org/3/library/stdtypes.html#dict.setdefault) - it doesn't change the above correct answer, it just means that the 'if key not in csv_dict...' logic can be excluded. I use setdefault a lot when dynamically building dicts. – mdmjsh Aug 24 '23 at 16:00
5

I'd suggest adding if rows in case there is an empty line at the end of the file

import csv
with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = dict(row[:2] for row in reader if row)
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • Both well-done and well-thought-out. But like I said above, should he really be ignoring the fact that his input line is longer than he expected? I'd say he should raise his own exception (with a custom message) if he gets a line with more than two items. – machine yearning Jul 19 '11 at 01:27
  • Or rather, as stated above by @Nate, at least print a warning message. This just doesn't seem like something you'd want to ignore. – machine yearning Jul 19 '11 at 01:29
  • your answer (vs. mine) made ponder something - is there an efficiency difference between slicing and indexing in this case? – Nate Jul 19 '11 at 01:29
  • 1
    @machine, no idea. Perhaps it's a dump of a user table from a database, and he just wants a dict of userid:username or something for example – John La Rooy Jul 19 '11 at 01:30
  • @Nate, I'd expect the tuple (your way) to be slightly faster – John La Rooy Jul 19 '11 at 01:35
  • 1
    Hey guys, thanks for the comments. Your discussion really helped me out with my problem. I like the idea about about raising a flag if the input is longer than expected. My data is a database dump and I do have more than two columns of data. – drbunsen Jul 19 '11 at 01:48
3

If you are OK with using the numpy package, then you can do something like the following:

import numpy as np

lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
   my_dict[lines[i][0]] = lines[i][1]
cloudyBlues
  • 75
  • 1
  • 4
  • I think you should change `dtype=str` because for `None` one gets bytes in as both keys and values. – mirekphd Mar 18 '22 at 09:51
3

with pandas, it is much easier, for example. assuming you have the following data as CSV and let's call it test.txt / test.csv (you know CSV is a sort of text file )

a,b,c,d
1,2,3,4
5,6,7,8

now using pandas

import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()

for each row, it would be

df.to_dict(orient='records')

and that's it.

TheTechGuy
  • 1,568
  • 4
  • 18
  • 45
2

You can use this, it is pretty cool:

import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
      records, metadata = commas.parse(f)
      for row in records:
            print 'this is row in dictionary:'+rowenter code here
hamed
  • 1,325
  • 1
  • 15
  • 18
2

Many solutions have been posted and I'd like to contribute with mine, which works for a different number of columns in the CSV file. It creates a dictionary with one key per column, and the value for each key is a list with the elements in such column.

    input_file = csv.DictReader(open(path_to_csv_file))
    csv_dict = {elem: [] for elem in input_file.fieldnames}
    for row in input_file:
        for key in csv_dict.keys():
            csv_dict[key].append(row[key])
1

Try to use a defaultdict and DictReader.

import csv
from collections import defaultdict
my_dict = defaultdict(list)

with open('filename.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        for key, value in line.items():
            my_dict[key].append(value)

It returns:

{'key1':[value_1, value_2, value_3], 'key2': [value_a, value_b, value_c], 'Key3':[value_x, Value_y, Value_z]}
Paulo Henrique Zen
  • 679
  • 2
  • 6
  • 12
1

here is an approach for CSV to Dict:

import pandas

data = pandas.read_csv('coors.csv')

the_dictionary_name = {row.k: row.v for (index, row) in data.iterrows()}
Ion Harin
  • 31
  • 4
0

If you have:

  1. Only 1 key and 1 value as key,value in your csv
  2. Do not want to import other packages
  3. Want to create a dict in one shot

Do this:

mydict = {y[0]: y[1] for y in [x.split(",") for x in open('file.csv').read().split('\n') if x]}

What does it do?

It uses list comprehension to split lines and the last "if x" is used to ignore blank line (usually at the end) which is then unpacked into a dict using dictionary comprehension.

Canute S
  • 334
  • 1
  • 5
  • 12