1

I am trying to read a .txt file that has the format

"Field1:Field2:Field3:Field4"
"a:b:c:d"
"e:f:g:h"

into a dictionary with the format

{Field1: [a, e], Field2: [b, f], Field3: [c, g], Field4: [d, h]}

and my current code looks like

with open("data.txt",'r') as filestream:
    lines = [line.strip().split(":") for line in filestream] 
    fields = lines[0] 
    d = dict.fromkeys(fields, []) 
    for i in range(1, len(lines)):
        for j in range(len(fields)):
            d[headers[j]].append(lines[i][j])

What I'm trying to do is convert each line in the file into a split and cleaned list, store that in a bigger list of lists, and then use a double for loop to match the key of the dictionary with the correct value in the smaller/current list. However, what the code ends up doing is adding to the dictionary in a way that looks like:

{Field1: [a], Field2: [a], Field3: [a], Field4: [a]}
{Field1: [a,b], Field2: [a,b], Field3: [a,b], Field4: [a,b]}

I want to add to the dictionary in the following manner:

{Field1: [a], Field2: [], Field3: [], Field4: []}
{Field1: [a], Field2: [b], Field3: [], Field4: []}

and so forth.

Can anyone help me figure out where my code is going wrong?

arnavlohe15
  • 332
  • 5
  • 16
  • Does this answer your question? [How do I initialize a dictionary of empty lists in Python?](https://stackoverflow.com/questions/11509721/how-do-i-initialize-a-dictionary-of-empty-lists-in-python) – shriakhilc Apr 06 '22 at 23:48
  • The issue is that `fromkeys` doesn't make a new list for each key, it uses the same list object you pass to it. The docs suggest to use a dict comprehension instead, and the question linked above suggests some other methods. – shriakhilc Apr 06 '22 at 23:49

2 Answers2

1

Try:

out = {}

with open("data.txt", "r") as f_in:
    i = (line.strip().split(":") for line in f_in)
    fields = next(i)
    for line in i:
        for k, v in zip(fields, line):
            out.setdefault(k, []).append(v)

print(out)

Prints:

{
    "Field1": ["a", "e"],
    "Field2": ["b", "f"],
    "Field3": ["c", "g"],
    "Field4": ["d", "h"],
}
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Can you explain what the "next" keyword does? – arnavlohe15 Apr 07 '22 at 00:02
  • @arnavlohe15 With `i = ...` I'm creating an iterator. With [`next(i)`](https://docs.python.org/3.8/library/functions.html#next) I'm advancing the iterator by one step (getting first line from the file - the fields row). – Andrej Kesely Apr 07 '22 at 00:03
1

The issue that you're having comes from the line:

d = dict.fromkeys(fields, []) 

More specifically, the []. What this line does here is that it creates a new dictionary with the fields as the keys, and the SAME empty list as the value for all the fields. Meaning that field1, field2, field3 and field4 are all using the same list to store their contents and this is the main reason as to why you're getting this problem.

Your issue can be fixed through a single line change, from:

d = dict.fromkeys(fields, []) 

to:

d = {field: [] for field in fields} 

Meaning that your source code would become:

with open("data.txt",'r') as filestream:
    lines = [line.strip().split(":") for line in filestream] 
    fields = lines[0] 
    d = {field: [] for field in fields} 
    for i in range(1, len(lines)):
        for j in range(len(fields)):
            d[fields[j]].append(lines[i][j])
0xOmarA
  • 131
  • 1
  • 3