1

I'm very new to Python and I know this is a pretty basic question. I have a text file with columns of data. I want to remove the columns and make it one long list.

I have the following code:

for line in open('feddocs_2011.txt', 'r'):
    segmentedLine = line.split("/t")
    print segmentedLine

This seems to create a separate string for each line, but I think I may need to loop through each of those new strings to split those next. I thought it would have put everything following a tab on a new line. I tried the following, but got an error message that "list" doesn't have a split function.

while segmentedLine:
    item = segmentedLine.split("\t")
    print item

Thanks very much for any input.

Kate D.
  • 31
  • 1
  • 2

5 Answers5

5

You've got the lines split properly in the first loop. What you want to do then is have a second for loop to iterate over each tab-separated item. That'll look like this:

for line in open('feddocs_2011.txt', 'r'):
    segmentedLine = line.split("\t")
    for item in segmentedLine:
        print item

Or more concisely, without the temporary variable:

for line in open('feddocs_2011.txt', 'r'):
    for item in line.split("\t"):
        print item
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
2

what about:

x = [line.split('\t') for line in open('file.txt')]

and you can join the lists, if you want:

sum(x, [])

[Edit]

if your file only have tabs (no spaces) you can simply do:

x = open('file.txt').read().split()
JBernardo
  • 32,262
  • 10
  • 90
  • 115
  • I don't know how the text is formatted for sure... If he also wants to convert data to int or to float he don't need to remove trailing \n or \r – JBernardo Jun 02 '11 at 03:13
  • using `sum()` is very inefficient (O(n^2)) on large lists – simon Jun 02 '11 at 03:24
  • @simon sometimes simpler is better... I know sum is intended to numbers, but it works nicely for small lists :) – JBernardo Jun 02 '11 at 03:39
  • @JBernado -- depends what you're doing, I suppose, but I would reckon 12 times slower is probably to be avoided! [link](http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python/953050#953050) – simon Jun 02 '11 at 03:57
  • @simon Python may be 12(or more) times slower than C... But we still use it! Sum is good and the most readable for small lists. – JBernardo Jun 02 '11 at 04:04
1

So you have

foo<tab>bar<tab>baz
bla<tab>bla<tab>bla

and you want it to be

foo
bar
baz
bla
bla
bla

Right?

Then you can just do

with open("myfile.txt", "r") as f:
    text = f.read().replace("\t", "\n")

Now text is a single string. If you want a list of all the items instead (["foo", "bar", "baz", "bla", "bla", "bla"]), use

text = f.read().replace("\t", "\n").split("\n")
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

if I understand correctly, what you're after is:

import itertools
print '\n'.join(list(itertools.chain(*[line.strip().split(',') for line in open('feddocs_2011.txt', 'r')])))
simon
  • 15,344
  • 5
  • 45
  • 67
0

put everything following a tab on a new line

If this is all you want, why not just use the str.replace function?

for line in open('feddocs_2011.txt', 'r'):
    segemented_line = line.replace('\t', '\n')
    print(segmented_line)

If, for some reason, you want to keep the tabs:

for line in open('feddocs_2011.txt', 'r'):
    segemented_line = line.replace('\t', '\t\n')
    print(segmented_line)
cledoux
  • 4,717
  • 1
  • 22
  • 30