0

I have a 2-column tab-separated input that I would like to populate a dictionary in python. The first column associates to the key (there are duplicates) and the second column associates to the value.

Sample input:

cat tail
cat whisker
cat meow
cat black
dog tail
dog paw
dog bark
bird    beak

I have written the following code, which produces an (albeit wrong) output that contains the dictionary format that I am looking for, which associates one key from col1 to all of its values in col2.

The code that I have been using is:

#!/usr/bin/python
# -*- coding: utf-8 -*-

keys = []
values = []

with open('animal-trial', "rU") as f:
    for line in f:
        line = line.split()
        keys.append(line[0])
        values.append(line[1])
    d = {}
    for k,v in zip(keys, values):
        d.setdefault(k, []).append(v)
    print d

I have looked up other references [HERE], [HERE] and [HERE], however, all of the suggestions, including with defaultdicts bring me to the same output, rather than the desired output.

The actual output is:

{'cat': ['tail']}
{'cat': ['tail', 'whisker']}
{'cat': ['tail', 'whisker', 'meow']}
{'cat': ['tail', 'whisker', 'meow', 'black']}
{'dog': ['tail'], 'cat': ['tail', 'whisker', 'meow', 'black']}
{'dog': ['tail', 'paw'], 'cat': ['tail', 'whisker', 'meow', 'black']}
{'dog': ['tail', 'paw', 'bark'], 'cat': ['tail', 'whisker', 'meow', 'black']}
{'bird': ['beak'], 'dog': ['tail', 'paw', 'bark'], 'cat': ['tail', 'whisker', 'meow', 'black']}

The desired output is

{'bird': ['beak'], 'dog': ['tail', 'paw', 'bark'], 'cat': ['tail', 'whisker', 'meow', 'black']} 

Can anyone point me to where I am making an error or have a more comprehensive solution so that the final result is one dictionary?

Community
  • 1
  • 1
owwoow14
  • 1,694
  • 8
  • 28
  • 43

4 Answers4

2

You can check if the key is present, if it's present then append and if it's not then create a list with single element:

d = {}
with open('a12', 'r') as f:
    for line in f:
        if line.strip():
            a = line.split()
            if a[0] not in d:
                d[a[0]] = [a[1]]
            else:
                d[a[0]].append(a[1])
print d

Output:

{'cat': ['tail', 'whisker', 'meow', 'black'], 'bird': ['beak'], 'dog': ['tail', 'paw', 'bark']}

With pandas:

import pandas as pd

df = pd.read_csv('file_name', header=None, sep='\s+')
print df.groupby(0)[1].apply(list).to_dict()

Output:

{'dog': ['tail', 'paw', 'bark'], 'bird': ['beak'], 'cat': ['tail', 'whisker', 'meow', 'black']}
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78
1

I assume you have an input file called f_input.txt.

You can also use groupbyfrom itertools module like this example:

from itertools import groupby

data = list(k.rstrip().split() for k in open("f_input.txt", 'r'))
final = {}
for k, v in groupby(data, lambda x : x[0]):
    final[k] = list(k[1] for k in list(v))

print(final)

Output:

{'bird': ['beak'], 'dog': ['tail', 'paw', 'bark'], 'cat': ['tail', 'whisker', 'meow', 'black']}
Chiheb Nexus
  • 9,104
  • 4
  • 30
  • 43
0

lets suppose you have split your input on "\n"

 d = {}
 tab = ['cat tail', 'cat whisker', 'cat meow', 'cat black', 'dog tail', 'dog paw', 'dog bark', 'bird beak']
 for i  in tab:
    try:
        d[i.split(" ")[0]] += [i.split(" ")[1]]
    except KeyError:
        d[i.split(" ")[0]] = [i.split(" ")[1]]

output

{'bird': ['beak'], 'dog': ['tail', 'paw', 'bark'], 'cat': ['tail', 'whisker', 'meow', 'black']}

bobtt
  • 73
  • 8
0

This can be solved with default defaultdict

Code:

from collections import defaultdict

def main():
    keys = []
    values = []

    with open('animal-trial', "rU") as f:
        for line in f:
            line = line.split()
            keys.append(line[0])
            values.append(line[1])
        d = defaultdict(list)
        for k,v in zip(keys, values):
            d[k].append(v)
        print(dict(d))

if __name__ == "__main__": main()

Output:

{'cat': ['tail', 'whisker', 'meow', 'black'], 'bird': ['beak'], 'dog': ['tail', 'paw', 'bark']}
Swapnil
  • 247
  • 1
  • 7