-7

I have a file which contains each line in the following format

"('-1259656819525938837', 598679497)\t0.036787946"    # "\t" within the string is the tab sign

I need to get the components out

-1259656819525938837  #string, it is the content within ' '
598679497     # long
0.036787946   # float

Python 2.6

Jin
  • 1,203
  • 4
  • 20
  • 44
  • There are many tools for parsing strings. In order of increasing power and complexity they are 1: functions such as split, 2: regular expressions, 3: full fledged parsers. This task seems simple enough for split. – Johan Råde May 22 '14 at 06:47
  • sorry to ask dumb questions, I was not aware of findall function. It should be closed now. – Jin May 22 '14 at 07:05

3 Answers3

1

You can use regular expressions from re module:

import re
s = "('-1259656819525938837', 598679497)\t0.036787946"
re.findall(r'[-+]?[0-9]*\.?[0-9]+', s)
% gives: ['-1259656819525938837', '598679497', '0.036787946']
Marcin
  • 215,873
  • 14
  • 235
  • 294
0

"2.7.0_bf4fda703454".split("_") gives a list of strings:

In [1]: "2.7.0_bf4fda703454".split("_")
Out[1]: ['2.7.0', 'bf4fda703454']

This splits the string at every underscore. If you want it to stop after the first split, use "2.7.0_bf4fda703454".split("_", 1).

If you know for a fact that the string contains an underscore, you can even unpack the LHS and RHS into separate variables:

In [8]: lhs, rhs = "2.7.0_bf4fda703454".split("_", 1)

In [9]: lhs
Out[9]: '2.7.0'

In [10]: rhs
Out[10]: 'bf4fda703454'
Pavan Gupta
  • 17,663
  • 4
  • 22
  • 29
0

You can use a regex to extract number and float from string:

>>> import re
>>> a = "('-1259656819525938837', 598679497)\t0.036787946"
>>> re.findall(r'[-?\d\.\d]+', a)
['-1259656819525938837', '598679497', '0.036787946']
cuonglm
  • 2,766
  • 1
  • 22
  • 33
  • Actually, the character set in your regular expression is misleading. You probably meant it in a different way. Otherwise, if you still want to use a character set, maybe a simpler one would suffice: `re.findall(r'[-\d\.]+', a)` – Jarno Lamberg May 26 '14 at 10:26