1

I have a binary file mixed with ASCII in which there are some floating point numbers I want to find. The file contains some lines like this:

1,1,'11.2','11.3';1,1,'100.4';

In my favorite regex tester I found that the correct regex should be ([0-9]+\.{1}[0-9]+).

Here's the code:

import re

data = open('C:\\Users\\Me\\file.bin', 'rb')
pat = re.compile(b'([0-9]+\.{1}[0-9]+)')
print(pat.match(data.read()))

I do not get a single match, why is that? I'm on Python 3.5.1.

JohnnyFromBF
  • 9,873
  • 10
  • 45
  • 59

2 Answers2

2

You can try like this,

import re
with open('C:\\Users\\Me\\file.bin', 'rb') as f:
    data = f.read()

re.findall("\d+\.\d+", data)

Output:

['11.2', '11.3', '100.4']

re.findall returns string list. If you want to convert to float you can do like this

>>> list(map(float, re.findall("\d+\.\d+", data)))
[11.2, 11.3, 100.4]
Adem Öztaş
  • 20,457
  • 4
  • 34
  • 42
2

How to find floating point numbers in binary file with Python?

float_re = br"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?"
for m in generate_tokens(r'C:\Users\Me\file.bin', float_re):
    print(float(m.group()))

where float_re is from this answer and generate_tokens() is defined here.


pat.match() tries to match at the very start of the input string and your string does not start with a float and therefore you "do not get a single match".


re.findall("\d+\.\d+", data) produces TypeError because the pattern is Unicode (str) but data is a bytes object in your case. Pass the pattern as bytes:
re.findall(b"\d+\.\d+", data)

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Your pattern allows for a space after the leading sign character, which I believe is an error. For example, "- 2.2" is not a floating point number, but "-2.2" is. – Steve Hollasch Apr 15 '16 at 00:57
  • Also, if you want to be really pedantic, you could limit the number of allowable exponent digits, if you knew whether this was a 32- or 64-bit floating-point value. – Steve Hollasch Apr 15 '16 at 00:58
  • @SteveHollasch What is or is not a float is a matter of a definition. Follow the link that explains the origin of the regex. – jfs Apr 15 '16 at 05:17