-1

I have a text file with 'n' lines. I want to extract first word, second word, third word, ... of each line into a list1, list2, list3,...

Suppose input txt file contains:

a1#a2#a3
b1#b2#b3#b4
c1#c2

After reading the file, Output should be:

List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}

The code:

f = open('path','r')
for line in f:
    List=line.split('#')
    List1 = List[0]
    print '{0},'.format(List1),
    List2 = List[1]
    print '{0},'.format(List2),
    List3 = List[2]
    print '{0},'.format(List3),
    List4 = List[3]
    print '{0},'.format(List4),

OUTPUT

a1,b1,c1,a2,b2,c2,a3,b3,b4
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
jashu
  • 11
  • 1

1 Answers1

1

You really don't want to use separate lists here; just use a list of lists. Using the csv module here would make handling splitting a little easier:

import csv

columns = [[] for _ in range(4)]  # 4 columns expected

with open('path', rb) as f:
    reader = csv.reader(f, delimiter='#')
    for row in reader:
        for i, col in enumerate(row):
            columns[i].append(col)

or, if the number of columns needs to grow dynamically:

import csv

columns = []

with open('path', rb) as f:
    reader = csv.reader(f, delimiter='#')
    for row in reader:
        while len(row) > len(columns):
            columns.append([])
        for i, col in enumerate(row):
            columns[i].append(col)

Or you can use itertools.izip_longest() to transpose the CSV rows:

import csv
from itertools import izip_longest    

with open('path', rb) as f:
    reader = csv.reader(f, delimiter='#')
    columns = [filter(None, column) for column in izip_longest(*reader)]

In the end, you can then print your columns with:

for i, col in enumerate(columns, 1):
    print 'List{}: {{{}}}'.format(i, ','.join(col))

Demo:

>>> import csv
>>> from itertools import izip_longest
>>> data = '''\
... a1#a2#a3
... b1#b2#b3#b4
... c1#c2
... '''.splitlines(True)
>>> reader = csv.reader(data, delimiter='#')
>>> columns = [filter(None, column) for column in izip_longest(*reader)]
>>> for i, col in enumerate(columns, 1):
...     print 'List{}: {{{}}}'.format(i, ','.join(col))
... 
List1: {a1,b1,c1}
List2: {a2,b2,c2}
List3: {a3,b3}
List4: {b4}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • yeah its working!! moreover, if I want to name each list as Name, city, state, zip instead of List1, List2,...how? – jashu Feb 17 '14 at 14:06
  • @jashu: If you know up front that there are 4 columns, just use `name, city, state, zip = columns`. – Martijn Pieters Feb 17 '14 at 14:07
  • I meant,some thing like Name: {a1,b1,c1} City: {a2,b2,c2} – jashu Feb 17 '14 at 14:11
  • @jashu: when printing, you mean? `for name, col in zip(('name', 'city', 'state', 'zip'), columns): print '{}: {{{}}}'.format(name, ','.join(col))`. – Martijn Pieters Feb 17 '14 at 14:13
  • I got the perfect answer, but is it possible to get the same output using no modules like csv / itertools @Martijn Pieters – jashu Feb 18 '14 at 05:51