Parsing a tab-delimited text file to replace columns with one vertical list (Python)

Question

I'm very new to Python and I know this is a pretty basic question. I have a text file with columns of data. I want to remove the columns and make it one long list.

I have the following code:

for line in open('feddocs_2011.txt', 'r'):
    segmentedLine = line.split("/t")
    print segmentedLine

This seems to create a separate string for each line, but I think I may need to loop through each of those new strings to split those next. I thought it would have put everything following a tab on a new line. I tried the following, but got an error message that "list" doesn't have a split function.

while segmentedLine:
    item = segmentedLine.split("\t")
    print item

Thanks very much for any input.

It isn't clear what you want to do. Can you give some (small) sample input and output? — andrewdski, Jun 02 '11 at 03:07

score 5 · Answer 1 · answered Jun 02 '11 at 03:09

5

You've got the lines split properly in the first loop. What you want to do then is have a second for loop to iterate over each tab-separated item. That'll look like this:

for line in open('feddocs_2011.txt', 'r'):
    segmentedLine = line.split("\t")
    for item in segmentedLine:
        print item

Or more concisely, without the temporary variable:

for line in open('feddocs_2011.txt', 'r'):
    for item in line.split("\t"):
        print item

answered Jun 02 '11 at 03:09

John Kugelman

349,597
67
533
578

Thank you very much - this worked! Thanks to everyone who responded so quickly! – Kate D. Jun 02 '11 at 04:13

JBernardo · Answer 2 · 2011-06-02T03:43:49.837

2

what about:

x = [line.split('\t') for line in open('file.txt')]

and you can join the lists, if you want:

sum(x, [])

[Edit]

if your file only have tabs (no spaces) you can simply do:

x = open('file.txt').read().split()

edited Jun 02 '11 at 03:43

answered Jun 02 '11 at 03:07

JBernardo

32,262
10
90
115

I don't know how the text is formatted for sure... If he also wants to convert data to int or to float he don't need to remove trailing \n or \r – JBernardo Jun 02 '11 at 03:13
using `sum()` is very inefficient (O(n^2)) on large lists – simon Jun 02 '11 at 03:24
@simon sometimes simpler is better... I know sum is intended to numbers, but it works nicely for small lists :) – JBernardo Jun 02 '11 at 03:39
@JBernado -- depends what you're doing, I suppose, but I would reckon 12 times slower is probably to be avoided! [link](http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python/953050#953050) – simon Jun 02 '11 at 03:57
@simon Python may be 12(or more) times slower than C... But we still use it! Sum is good and the most readable for small lists. – JBernardo Jun 02 '11 at 04:04

score 1 · Answer 3 · answered Jun 02 '11 at 04:54

So you have

foo<tab>bar<tab>baz
bla<tab>bla<tab>bla

and you want it to be

foo
bar
baz
bla
bla
bla

Right?

Then you can just do

with open("myfile.txt", "r") as f:
    text = f.read().replace("\t", "\n")

Now text is a single string. If you want a list of all the items instead (["foo", "bar", "baz", "bla", "bla", "bla"]), use

text = f.read().replace("\t", "\n").split("\n")

simon · Answer 4 · 2011-06-02T03:22:57.990

0

if I understand correctly, what you're after is:

import itertools
print '\n'.join(list(itertools.chain(*[line.strip().split(',') for line in open('feddocs_2011.txt', 'r')])))

edited Jun 02 '11 at 03:22

answered Jun 02 '11 at 03:15

simon

15,344
5
45
67

score 0 · Answer 5 · answered Jun 02 '11 at 03:27

put everything following a tab on a new line

If this is all you want, why not just use the str.replace function?

for line in open('feddocs_2011.txt', 'r'):
    segemented_line = line.replace('\t', '\n')
    print(segmented_line)

If, for some reason, you want to keep the tabs:

for line in open('feddocs_2011.txt', 'r'):
    segemented_line = line.replace('\t', '\t\n')
    print(segmented_line)

Parsing a tab-delimited text file to replace columns with one vertical list (Python)

5 Answers5