Extract numbers with EXPONENTS from heterogeneous text file

Question

I need to take out some unformatted numerical data from text file. In the textfile, the numbers are somewhere separated by single space and somewhere by multiple spaces, somewhere by tabs; pretty heterogeneous text :( I want Python to ignore all spaces/tabs and identify whole numerical values and put them in an array/list. Is it possible to do this using Python?

EDIT: There are many numbers written in scientific/exponential notation e.g. 1.2345E+06, and Python does not recognize them as numbers. So \d does not work simply :(

I don't want to use a normal string search for this purpose (given there are many strings/words which are of no interest/use). The regular expression module documentation has nothing mentioned about this issue.

Please show us a sample of the text file, and the effort you made so far. Where in your code are you having problems? — Selcuk, Mar 24 '14 at 08:15
possible duplicate of [Python: Extract numbers from a string](http://stackoverflow.com/questions/4289331/python-extract-numbers-from-a-string) — Douglas Leeder, Mar 24 '14 at 08:16

Luis Masuelli · Answer 1 · 2014-03-29T23:03:21.647

2

If lines are like " 2.3e4 " or "2.6" or so, try:

^\s*?([+-]?\d+(\.\d+)?(e[+-]?\d+)?)\s*$

notice the \s*? mark (non-greedy zero/more spaces). Dont forget the question mark there - not including the question mark will make you capture only the last digit of your number due to greediness.

AFAIK python has not a special symbol, other than \d for digits, to capture numbers

edited Mar 29 '14 at 23:03

answered Mar 26 '14 at 15:04

Luis Masuelli

12,079
10
49
87

score 1 · Answer 2 · edited May 23 '17 at 10:26

1

You could use a regular expression like \s+([-+]?\d*\.?\d+(?:[eE][-+]?\d+)?)\s+ (adapted from here). Take a look at this to see how you can search for a regular expression in a file.

edited May 23 '17 at 10:26

Community

1
1

answered Mar 24 '14 at 08:16

Unknown

5,722
5
43
64

Is there any regular expression special character (similar to \d )for matching numbers in scientific/exponential formats e.g `1.23450E+06` ? – Mubeen Shahid Mar 26 '14 at 14:59

Extract numbers with EXPONENTS from heterogeneous text file

2 Answers2