parsing a tab-separated file in Python

Question

I'm trying to parse a tab-separated file in Python where a number placed k tabs apart from the beginning of a row, should be placed into the k-th array.

Is there a built-in function to do this, or a better way, other than reading line by line and do all the obvious processing a naive solution would perform?

sometimes easy to forget, but it's customary to accept an answer to your question.. — Scott Prive, Jun 21 '17 at 13:21

Gareth Latty · Answer 1 · 2012-06-16T11:05:46.557

71

You can use the csv module to parse tab seperated value files easily.

import csv

with open("tab-separated-values") as tsv:
    for line in csv.reader(tsv, dialect="excel-tab"): #You can also use delimiter="\t" rather than giving a dialect.
        ...

Where line is a list of the values on the current row for each iteration.

Edit: As suggested below, if you want to read by column, and not by row, then the best thing to do is use the zip() builtin:

with open("tab-separated-values") as tsv:
    for column in zip(*[line for line in csv.reader(tsv, dialect="excel-tab")]):
        ...

edited Jun 16 '12 at 11:05

answered Jun 15 '12 at 23:39

Gareth Latty

86,389
17
178
183

whenever an element is missing there are two consecutive tabs. will that work? – Bob Jun 15 '12 at 23:53
5

@Bob Why don't you try it and see? (But yes, it will). – Gareth Latty Jun 15 '12 at 23:56
3

@Lattyware: Your use of "file" as a variable name is a no-no... ;) – martineau Jun 16 '12 at 04:31
2

@martineau: of all the default builtin names to rebind, `file` is the least problematic, esp. because it doesn't even exist in 3. Y'all can have "for file in files:` when you pry it from my cold, dead hands! ;^) – DSM Jun 16 '12 at 04:54
@martineau I'm a Python 3.x man, so I sometimes forget this is smashing `file` in 2.x. Good point, however. Edited. – Gareth Latty Jun 16 '12 at 11:05

martineau · Answer 2 · 2018-07-04T01:07:24.580

I don't think any of the current answers really do what you said you want. (Correction: I now see that @Gareth Latty / @Lattyware has incorporated my answer into his own as an "Edit" near the end.)

Anyway, here's my take:

Say these are the tab-separated values in your input file:

1   2   3   4   5
6   7   8   9   10
11  12  13  14  15
16  17  18  19  20

then this:

with open("tab-separated-values.txt") as inp:
    print( list(zip(*(line.strip().split('\t') for line in inp))) )

would produce the following:

[('1', '6', '11', '16'), 
 ('2', '7', '12', '17'), 
 ('3', '8', '13', '18'), 
 ('4', '9', '14', '19'), 
 ('5', '10', '15', '20')]

As you can see, it put the k-th element of each row into the k-th array.

dawg · Answer 3 · 2012-06-16T04:57:40.393

7

Like this:

>>> s='1\t2\t3\t4\t5'
>>> [x for x in s.split('\t')]
['1', '2', '3', '4', '5']

For a file:

# create test file:
>>> with open('tabs.txt','w') as o:
...    s='\n'.join(['\t'.join(map(str,range(i,i+10))) for i in [0,10,20,30]])
...    print >>o, s

#read that file:
>>> with open('tabs.txt','r') as f:
...    LoL=[x.strip().split('\t') for x in f]
... 
>>> LoL
[['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], 
 ['10', '11', '12', '13', '14', '15', '16', '17', '18', '19'], 
 ['20', '21', '22', '23', '24', '25', '26', '27', '28', '29'], 
 ['30', '31', '32', '33', '34', '35', '36', '37', '38', '39']]
>>> LoL[2][3]
23

If you want the input transposed:

>>> with open('tabs.txt','r') as f:
...    LoT=zip(*(line.strip().split('\t') for line in f))
... 
>>> LoT[2][3]
'32'

Or (better still) use the csv module in the default distribution...

edited Jun 16 '12 at 04:57

answered Jun 15 '12 at 23:37

dawg

98,345
23
131
206

In Python, making an empty list and then appending values is an anti-pattern. That's what list comprehensions are for. – Gareth Latty Jun 15 '12 at 23:42
@Lattyware: I personally do not find the first form hard to read, but you are right -- a nested list comprehension is probably more Pythonic. Edited. – dawg Jun 16 '12 at 01:07
@drewk: `[x.split('\t') for f.split('\n')]` makes no sense. There's no `x` and files objects don't have a `split()` method. – martineau Jun 16 '12 at 03:55
@martineau: perfect example of why to use the csv module, no? typo fixed. I tested it – dawg Jun 16 '12 at 04:18
@drewk: Well, not so much...most likely the latter thing IMHO. ;) – martineau Jun 16 '12 at 04:37
@martineau: Interesting that you interpreted that the OP wanted to transpose the row / col of the file being read. Unclear to me one way or the other. If that is what is wanted, use `zip()` as you did. Thanks for your patient coaching... – dawg Jun 16 '12 at 04:47

Alauddin Sabari · Answer 4 · 2021-08-01T03:28:29.090

1

You can easily do it like this way by python pandas pd.read_csv ('file_name.tsv', sep='\t')

[Note: need to install pandas with this command pip install pandas]

edited Aug 01 '21 at 03:28

answered Aug 01 '21 at 03:23

Alauddin Sabari

19
3

parsing a tab-separated file in Python

4 Answers4

Linked

Related