PYTHON: Read a file into a dictionary with column n as key and column m as value

Question

I know how to do it if its just two columns, but what if the file is like:

01 asd 023 green
01 dff 343 blue
02 fdf 342 yellow
02 fff 232 brown
02 kjf 092 green
03 kja 878 blue

and say, I would like column 2 to be the key to my dictionary and column 4 to my content for that key? I was thinking, a way to go around this problem would be to totally delete the other useless columns so that only the two I need remain, then I can use a script which I also saw on this website to make the dictionary

Python - file to dictionary?

Of course, this is a way around the problem, any tip is greatly appreciated.

How do you know your keys are in column 2 and the values in column 4? How are the columns delimited? Always by spaces like in your example? — Lukas Graf, Dec 06 '12 at 19:16
yes, sorry about that. They are separated by spaces, and this is an example, lets say for this example, my keys are always on 2 and values always on 4 — Dergyll, Dec 06 '12 at 19:18

NPE · Accepted Answer · 2012-12-06T19:33:21.773

5

d = {}
with open('data.txt') as f:
  for line in f:
    tok = line.split()
    d[tok[1]] = tok[3]
print(d)

This produces

{'kja': 'blue', 'kjf': 'green', 'fdf': 'yellow', 'asd': 'green', 'fff': 'brown', 'dff': 'blue'}

split() (without an argument) splits the lines into lists of strings. tok[1] and tok[3] then use list indexing to address the second and fourth values in those lists, assigning them to a dictionary's keys and values (d[key] = value).

edited Dec 06 '12 at 19:33

answered Dec 06 '12 at 19:16

NPE

486,780
108
951
1,012

thank you for the help, what you did was read the file in (lines) and split the columns up. The d[tok[1]] = tok[3] part means you use the tok[1] as column 1, and tok[3] as column 4 right? Thanks alot for the simple to understand code, I hope my beginner's vocabulary didn't annoy anyone. Thanks for the rapid response as well! – Dergyll Dec 06 '12 at 19:22
@Dergyll: You're welcome. Yes, `tok[1]` (column 2) is the key and `tok[3]` (column 4) is the value. – NPE Dec 06 '12 at 19:23
@NPE Just to note that in `line.strip().split()` the `strip()` is redundant because of the behaviour of `split()` / `split(None)` – Jon Clements Dec 06 '12 at 19:29
Hello NPE, I get a funky error: "AttributeError: 'list' object has no attribute 'split'" what does that mean? Is my original file in a weird format or something? Its a txt file... – Dergyll Dec 06 '12 at 19:50
@Dergyll: This exact code works for me. Without seeing the exact code that you're running and the exception stack trace, it's hard to say what's going wrong in your case. – NPE Dec 06 '12 at 19:52

Jon Clements · Answer 2 · 2012-12-06T19:39:56.490

2

Something like

from operator import itemgetter
keyval = itemgetter(1, 3)
with open('file') as fin:
    keyvals = (keyval(line.split()) for line in fin)
    my_dict = dict(keyvals)

Notes:

This differs from @NPE's answer in the sense it uses the builtin dict for initialisation, rather than declaring it outside the loop. It also utilises itemgetter as a key retrieval function which takes the 2nd and 4th values from each line (when split by spaces) and uses a generator expression to apply that to each line in the file.

There's also a slight advantage (although, usually not that important) that should my_dict = dict(keyvals) fail, then the name never ends up being bound, while if something occurs by assigning key by key, then it's possible a dict declared outside the with statement ends up "dirty".

edited Dec 06 '12 at 19:39

answered Dec 06 '12 at 19:17

Jon Clements

138,671
33
247
280

using itemgetter is unnecessary – Jon Martin Dec 06 '12 at 19:19
1

@JonMartin and splitting `line` twice as per your example is? – Jon Clements Dec 06 '12 at 19:21
I only did that for a quick two-line solution. It's certainly better to split it once. – Jon Martin Dec 06 '12 at 19:22

score 0 · Answer 3 · answered Dec 06 '12 at 19:16

0

d = {}
with open("file.txt") as f:
    for line in f:
        (key1, val1, key2, val2) = line.split()
        d[int(key1)] = val
        d[int(key2)] = val2

Will get you all of them. Otherwise, you can do something along the lines on NPE.

answered Dec 06 '12 at 19:16

Emil Ivanov

37,300
12
75
90

score 0 · Answer 4 · answered Dec 06 '12 at 19:17

0

for line in f.readlines():
    my_dic[line.split(' ')[1]] = line.split(' ')[3]

answered Dec 06 '12 at 19:17

Jon Martin

3,252
5
29
45

`for line in f.readlines()` is just `for line in f` ... (unless you intend to read the entire file into memory) – Jon Clements Dec 06 '12 at 19:58

PYTHON: Read a file into a dictionary with column n as key and column m as value

4 Answers4