55

I have file that contains values separated by tab ("\t"). I am trying to create a list and store all values of file in the list. But I get some problem. Here is my code.

line = "abc def ghi"
values = line.split("\t")

It works fine as long as there is only one tab between each value. But if there is one than one tab then it copies the tab to values as well. In my case mostly the extra tab will be after the last value in the file.

flow2k
  • 3,999
  • 40
  • 55
hjelpmig
  • 1,406
  • 6
  • 19
  • 23
  • 5
    Are there other whitespace characters in your values that should *not* be split on? Because if not, you could just use `line.split()`, and it would take care of this (as well as discarding empty leading/trailing elements). – Tim Pietzcker Jun 11 '13 at 08:32

5 Answers5

88

You can use regex here:

>>> import re
>>> strs = "foo\tbar\t\tspam"
>>> re.split(r'\t+', strs)
['foo', 'bar', 'spam']

update:

You can use str.rstrip to get rid of trailing '\t' and then apply regex.

>>> yas = "yas\t\tbs\tcda\t\t"
>>> re.split(r'\t+', yas.rstrip('\t'))
['yas', 'bs', 'cda']
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
5

Split on tab, but then remove all blank matches.

text = "hi\tthere\t\t\tmy main man"
print([splits for splits in text.split("\t") if splits])

Outputs:

['hi', 'there', 'my main man']
CornSmith
  • 1,957
  • 1
  • 19
  • 35
4

You can use regexp to do this:

import re
patt = re.compile("[^\t]+")


s = "a\t\tbcde\t\tef"
patt.findall(s)
['a', 'bcde', 'ef']  
DimmuR
  • 281
  • 2
  • 5
1

An other regex-based solution:

>>> strs = "foo\tbar\t\tspam"

>>> r = re.compile(r'([^\t]*)\t*')
>>> r.findall(strs)[:-1]
['foo', 'bar', 'spam']
Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125
0

Python has support for CSV files in the eponymous csv module. It is relatively misnamed since it support much more that just comma separated values.

If you need to go beyond basic word splitting you should take a look. Say, for example, because you are in need to deal with quoted values...

Sylvain Leroux
  • 50,096
  • 7
  • 103
  • 125