2

My text file format is:

apple      very healthy
orange     tangy and juicy
banana     yellow in color and yummy

I need to create either two lists:

l1 = ['apple','orange','banana']
l2=['very healthy','tangy and juicy','yellow in color and yummy']

or convert the values into a dictionary:

d1={'apple':'very healthy','orange':'tangy and juicy','banana':'yellow in color and yummy'}

The first two columns in the file are separated by tab.

I tried the following code to change it to two lists and then convert it into a dictionary:

l1=[]
l2=[]
d={}
read_file=open('edges.txt','r')
split= [line.strip() for line in read_file]
for line in split:
    l1.append(line.split('\t')[0])
    l2.append(line.split('\t')[1:])
d=dict(zip(l1,l2))
print d

I am getting some incorrect values. I am newbie to python..

Brent Washburne
  • 12,904
  • 4
  • 60
  • 82
Pavithra K C
  • 75
  • 1
  • 10

5 Answers5

1

Make sure your text file contains tabs between those values, what I copied from here has whitespaces.

Textfile:

apple   very healthy
orange  tangy and juicy
banana  yellow in color and yummy

Output of your script:

{'orange': ['tangy and juicy'], 'apple': ['very healthy'], 'banana': ['yellow in color and yummy']}

MohitC
  • 4,541
  • 2
  • 34
  • 55
0

The problem could be that the file's columns aren't actually separated by tabs, but instead by multiple spaces (and, in fact, the "text file format" you posted does not use tabs). One way to fix this is:

l1=[]
l2=[]
d={}
read_file=open('edges.txt','r')
split= [line.strip() for line in read_file]
for line in split:
    l1.append(line.split('  ')[0].strip())
    l2.append('  '.join(line.split('  ')[1:]).strip())
d=dict(zip(l1,l2))
print d

This will instead separate the two columns if at least two spaces are used. However, this will not work if you are actually using tabs, in which case you should use your original code. And, if none of the values (e.g. tangy and juicy, very healthy) have two spaces in a row in them, you can replace

'  '.join(line.split('  ')[1:]).strip()

With

line.split('  ')[1].strip()
pommicket
  • 929
  • 7
  • 17
  • No here the text file has two columns and they are separated by tab. But the second columns is not a single word. it is a statement or multiple words. – Pavithra K C Sep 26 '15 at 00:40
  • This code isn't for separating the columns using one space, it's for separating the columns with _2 or more_ spaces. Also, you should try this code because in your sample of your text file, the columns are separated by 5-6 spaces, not tabs. – pommicket Sep 26 '15 at 01:55
0

line.split('\t') returns a list, and line.split('\t')[0] returns the first element of that list ('apple', 'orange', 'banana').

l2.append(line.split('\t')[1:] returns a list because [1:] is a slice. Maybe you want l2.append(line.split('\t')[1] instead?

I couldn't resist rewriting the code:

d={}
for line in open('edges.txt','r'):
    split = line.strip().split('\t', 1)
    d[split[0]] = split[1]
print d
Community
  • 1
  • 1
Brent Washburne
  • 12,904
  • 4
  • 60
  • 82
0

import re

d = {}
with open('data') as f:
    for line in f:
        mobj =  re.match('(\w+)\s+(.*)',line)
        key, value = mobj.groups()
        d[key] = value


for k,v in d.items():
    print(k,"   ", v )

output

banana yellow in color and yummy

apple very healthy

orange tangy and juicy

LetzerWille
  • 5,355
  • 4
  • 23
  • 26
0

If your text file is in fact fixed width (i.e. contains spaces instead of tab characters), you can parse it simply by using indices to slice the first 10 characters (as the keys in your dictionary) and 11th character onwards (as the values).

fruits = {line[:10].strip(): line[10:].strip() for line in read_file}

This question has some answers on parsing more complicated fixed-width text files; and you could also use pandas.read_fwf.

Community
  • 1
  • 1
Stuart
  • 9,597
  • 1
  • 21
  • 30