12

My program looks something like this:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The quick brown fox jumped"
escaped_str = re.escape(my_str)
# "The\\ quick\\ brown\\ fox\\ jumped"
# Replace escaped space patterns with a generic white space pattern
spaced_pattern = re.sub(r"\\\s+", r"\s+", escaped_str)
# Raises error

The error is this:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/home/swfarnsworth/programs/pycharm-2019.2/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/swfarnsworth/projects/medaCy/medacy/tools/converters/con_to_brat.py", line 255, in <module>
    content = convert_con_to_brat(full_file_path)
  File "/home/swfarnsworth/projects/my_file.py", line 191, in convert_con_to_brat
    start_ind = get_absolute_index(text_lines, d["start_ind"], d["data_item"])
  File "/home/swfarnsworth/projects/my_file.py", line 122, in get_absolute_index
    entity_pattern_spaced = re.sub(r"\\\s+", r"\s+", entity_pattern_escaped)
  File "/usr/local/lib/python3.7/re.py", line 192, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "/usr/local/lib/python3.7/re.py", line 309, in _subx
    template = _compile_repl(template, pattern)
  File "/usr/local/lib/python3.7/re.py", line 300, in _compile_repl
    return sre_parse.parse_template(repl, pattern)
  File "/usr/local/lib/python3.7/sre_parse.py", line 1024, in parse_template
    raise s.error('bad escape %s' % this, len(this))
re.error: bad escape \s at position 0

I get this error even if I remove the two backslashes before the '\s+' or if I make the raw string (r"\\\s+") into a regular string. I checked the Python 3.7 documentation, and it appears that \s is still the escape sequence for white space.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
Steele Farnsworth
  • 863
  • 1
  • 6
  • 15
  • When I use your code, with `entity_pattern_escaped` changed to `escaped_str`, then `print(spaced_pattern)` produces `The\s+quick\s+brown\s+fox\s+jumped` which looks like the desired result. – Barmar Oct 10 '19 at 18:03
  • I couldn't reproduce in 3.6.3, but it fails at ideone.com which is 3.7.3. – Barmar Oct 10 '19 at 18:06
  • 1
    There's apparently been a change in how the replacement string is processed in 3.7. – Barmar Oct 10 '19 at 18:09
  • @SteeleFarnsworth: About your (deleted) question "*How can I implement a new Python class within the cpython interpreter?*": I noticed smth missing in *\_collectionmodule.c*. – CristiFati Apr 09 '20 at 20:38

7 Answers7

19

Try fiddling with the backslashes to avoid that regex tries to interpret \s:

spaced_pattern = re.sub(r"\\\s+", "\\\s+", escaped_str)

now

>>> spaced_pattern
'The\\s+quick\\s+brown\\s+fox\\s+jumped'
>>> print(spaced_pattern)
The\s+quick\s+brown\s+fox\s+jumped

But why?

It seems that python tries to interpret \s like it would interpret r"\n" instead of leaving it alone like Python normally does. If you do. For example:

re.sub(r"\\\s+", r"\n+", escaped_str)

yields:

The
+quick
+brown
+fox
+jumped

even if \n was used in a raw string.

The change was introduced in Issue #27030: Unknown escapes consisting of '\' and ASCII letter in regular expressions now are errors.

The code that does the replacement is in sre_parse.py (python 3.7):

        else:
            try:
                this = chr(ESCAPES[this][1])
            except KeyError:
                if c in ASCIILETTERS:
                    raise s.error('bad escape %s' % this, len(this))

This code looks for what's behind a literal \ and tries to replace it by the proper non-ascii character. Obviously s is not in ESCAPES dictionary so the KeyError exception is triggered, then the message you're getting.

On previous versions it just issued a warning:

import warnings
warnings.warn('bad escape %s' % this,
              DeprecationWarning, stacklevel=4)

Looks that we're not alone to suffer from 3.6 to 3.7 upgrade: https://github.com/gi0baro/weppy/issues/227

wim
  • 338,267
  • 99
  • 616
  • 750
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • Thank you for the "why" portion... direly missing in the top voted regex module answer, although the top voted answer did solve the problem in a jiffy. – English Rain Mar 22 '22 at 16:41
  • 1
    Thanks. `regex` module is way more powerful, that's true, but that's also not provided with basic python install, so more people will use `re`, because it's standard. I know that regex package can do marvels with nested regexes. As long as I don't need that, I'll stick to `re` – Jean-François Fabre Mar 22 '22 at 21:52
13

Just try import regex as re instead of import re.

NiYanchun
  • 697
  • 8
  • 11
1

Here is my simple code, which uses python-binance library and pandas, and it works in one venv with python 3.7, but when i had created new one for another project (python 3.7 as well) it threw the same errors with regex:

import pandas as pd
from binance import Client

api_key = ''
api_secret = ''

client = Client(api_key, api_secret)

timeframe = '1h'
coin = 'ETHUSDT'


def GetOHLC(coin, timeframe):
    frame = pd.DataFrame(client.get_historical_klines(coin, timeframe, '01.01.2015'))
    frame = frame.loc[:, :5]
    frame.columns = ['date', 'open', 'high', 'low', 'close', 'volume']
    frame.set_index('date', inplace=True)        
    frame.to_csv(path_or_buf=(coin+timeframe))


GetOHLC(coin, timeframe)

I had made some research but didn't find suitable solution. Then i compared version of regex lib of workable instance and new one: old one was from 2021 and new one was from 2022. Then i uninstall version of 2022 and install 2021 and it has started to work without any exceptions. Hope it will help in some particular cases.

0

I guess you might be trying to do:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
escaped_str = re.escape(my_str)
# "The\\ quick\\ brown\\ fox\\ jumped"
# Replace escaped space patterns with a generic white space pattern
print(re.sub(r"\\\\\\\s+", " ", escaped_str))

Output 1

The quick brown fox jumped

If you might want to have literal \s+, then try this answer or maybe:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
escaped_str = re.escape(my_str)
print(re.sub(r"\\\\\\\s+", re.escape(r"\s") + '+', escaped_str))

Output 2

The\s+quick\s+brown\s+fox\s+jumped

Or maybe:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
print(re.sub(r"\s+", "s+", my_str))

Output 3

The\s+quick\s+brown\s+fox\s+jumped

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Demo

Emma
  • 27,428
  • 11
  • 44
  • 69
0

In case you are trying to replace anything by a single backslash, both the re and regex packages of Python 3.8.5 cannot do it alone.

The solution I rely on is to split the task between re.sub and Python's replace:

import re
re.sub(r'([0-9.]+)\*([0-9.]+)',r'\1 XBACKSLASHXcdot \2'," 4*2").replace('XBACKSLASHX','\\')
Louis Gagnon
  • 129
  • 1
  • 3
-1

Try:

import regex as re

instead of:

import re

This has worked for me recently when I encountered this error.

Nikita Shabankin
  • 609
  • 8
  • 17
-3
pip uninstall regex

pip install regex==2022.3.2
jmoerdyk
  • 5,544
  • 7
  • 38
  • 49