1

For my python class, I am working on opening a .tsv file and taking 15 rows of data, broken down in 4 columns, and turning it into lists for each line. To do this, I must remove the tabs in between each column.

I've been advised to use a for loop and loop through each line. This makes sense but I can't figure out how to remove the tabs.

Any help?

Chris Bennett
  • 51
  • 1
  • 2
  • 5

4 Answers4

4

To read lines from a file, and split each line on the tab delimiter, you can do this:

rows = []
for line in open('file.tsv', 'rb'):
    rows.append(line.strip().split('\t'))
samplebias
  • 37,113
  • 6
  • 107
  • 103
4

Properly, this should be done using the Python CSV module (as mentioned in another answer) as this will handle escaped separators, quoted values etc.

In the more general sense, this can be done with a list comprehension:

rows = [line.split('\t') for line in file]

And, as suggested in the comments, in some cases a generator expression would be a better choice:

rows = (line.split('\t') for line in file)

See Generator Expressions vs. List Comprehensions for some discussion on when to use each.

Community
  • 1
  • 1
Blair
  • 15,356
  • 7
  • 46
  • 56
  • 2
    I would actually use a [generator expression](http://www.python.org/dev/peps/pep-0289/) here instead of a list comprehension, so that you're not holding a bunch of lists in memory while you process them. Depends on what you're doing with the results, though. – Sasha Chedygov Dec 12 '12 at 22:10
3

You should use Python's stdlib csv module, particularly the csv.reader function.

rows = [row for row in csv.reader(open('yourfile.tsv', 'rb'), delimiter='\t')]

There's also a a dialect parameter that can take excel-tab to conform to Microsoft Excel's tab-delimited format.

Mahmoud Abdelkader
  • 23,011
  • 5
  • 41
  • 54
2

Check out the built-in string functions. split() should do the job.

>>> line = 'word1\tword2\tword3'
>>> line.split('\t')
['word1', 'word2', 'word3']
user35147863
  • 2,525
  • 2
  • 23
  • 25