0

I'm converting strings to floats using float(x). However for some reason, one of the strings is "71.2\x0060". I've tried following this answer, but it does not remove the bytes character

>>> s = "71.2\x0060"
>>> "".join([x for x in s if ord(x) < 127])
'71.2\x0060'

Other methods I've tried are:

>>> s.split("\\x")
['71.2\x0060']
>>> s.split("\x")
ValueError: invalid \x escape

I'm not sure why this string is not formatted correctly, but I'd like to get as much precision from this string and move on.

Patrick Stetz
  • 455
  • 2
  • 7
  • 14

2 Answers2

0

Going off of wim's comment, the answer might be this:

>>> s.split("\x00")
['71.2', '60']

So I should do:

>>> float(s.split("\x00")[0])
71.2
Patrick Stetz
  • 455
  • 2
  • 7
  • 14
0

Unfortunately the POSIX group \p{XDigit} does not exist in the re module. To remove the hex control characters with regular expressions anyway, you can try the following.

impore re
re.sub(r'[\x00-\x1F]', r'', '71.2\x0060')  # or:
re.sub(r'\\x[0-9a-fA-F]{2}', r'', r'71.2\x0060')

Output:

'71.260'
'71.260'

r means raw. Take a look at the control characters up to hex 1F in the ASCII table: https://www.torsten-horn.de/techdocs/ascii.htm

qräbnö
  • 2,722
  • 27
  • 40
  • I'm a tiny bit uncomfortable about this (just because I don't understand where the issue started), this number is from a log file in between `71.2394` and `71.2727` and every number has 4 decimal points. It seems wrong ignoring this character and joining. If the number was `71.260\x00`, I'd be happier with a result of `71.260` – Patrick Stetz Sep 24 '19 at 21:46
  • Maybe `\x00` means `0`? Then: `s.replace('\x00', '0')`. Otherwise you can use `r'[\x00-\x1F][0-9]*$'` in the first re.sub parameter to remove till the end - or use your split solution. :) – qräbnö Sep 24 '19 at 21:50