2

I need to take out some unformatted numerical data from text file. In the textfile, the numbers are somewhere separated by single space and somewhere by multiple spaces, somewhere by tabs; pretty heterogeneous text :( I want Python to ignore all spaces/tabs and identify whole numerical values and put them in an array/list. Is it possible to do this using Python?

EDIT: There are many numbers written in scientific/exponential notation e.g. 1.2345E+06, and Python does not recognize them as numbers. So \d does not work simply :(

I don't want to use a normal string search for this purpose (given there are many strings/words which are of no interest/use). The regular expression module documentation has nothing mentioned about this issue.

Mubeen Shahid
  • 316
  • 6
  • 18
  • Please show us a sample of the text file, and the effort you made so far. Where in your code are you having problems? – Selcuk Mar 24 '14 at 08:15
  • possible duplicate of [Python: Extract numbers from a string](http://stackoverflow.com/questions/4289331/python-extract-numbers-from-a-string) – Douglas Leeder Mar 24 '14 at 08:16

2 Answers2

2

If lines are like " 2.3e4 " or "2.6" or so, try:

^\s*?([+-]?\d+(\.\d+)?(e[+-]?\d+)?)\s*$

notice the \s*? mark (non-greedy zero/more spaces). Dont forget the question mark there - not including the question mark will make you capture only the last digit of your number due to greediness.

AFAIK python has not a special symbol, other than \d for digits, to capture numbers

Luis Masuelli
  • 12,079
  • 10
  • 49
  • 87
1

You could use a regular expression like \s+([-+]?\d*\.?\d+(?:[eE][-+]?\d+)?)\s+ (adapted from here). Take a look at this to see how you can search for a regular expression in a file.

Community
  • 1
  • 1
Unknown
  • 5,722
  • 5
  • 43
  • 64
  • Is there any regular expression special character (similar to \d )for matching numbers in scientific/exponential formats e.g `1.23450E+06` ? – Mubeen Shahid Mar 26 '14 at 14:59