1

I want to match the following text using regex:

x-xxxxxx23 w44jrflsdhdsls 2021-10-09 02:46:37,371 - mycode - ERROR - Error in mycode interaction
Traceback (most recent call last):
  File "mycode.py", line 83, in upload_detection_image
    put_response = s3_object.put(Body=image, ContentType="image/jpeg")
  File "/usr/local/lib/python3.6/dist-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/usr/local/lib/python3.6/dist-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.

I can match everything starting from the second line using:

    query = re.compile(
        rf"^Traceback(.*?){re.escape('botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.')}(.*?)\n",
        flags=re.DOTALL | re.MULTILINE)

But I try to match also the new first line I don't get any matches:

    query = re.compile(
        rf"^(.*?){re.escape('Error in mycode')}(.*?)Traceback(.*?){re.escape('botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.')}(.*?)\n",
        flags=re.DOTALL | re.MULTILINE),

The text is part of a larger text and I find matches using:

matches = [match for match in pattern.finditer(text)]

Thank you

EDIT: I can confirm that the following matches everything starting from Error in mycode:

rf"{re.escape('Error in S3')}(.*?)Traceback(.*?){re.escape('botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.')}(.*?)\n",
        flags=re.DOTALL | re.MULTILINE)
user1315621
  • 3,044
  • 9
  • 42
  • 86
  • 1
    I suggest trying something simpler. What happens if you do `re.compile('Error in mycode')`? Does this match the first line? Now gradually add more complexity to see what it matches until it doesn't match what you think it should. – Code-Apprentice Oct 11 '21 at 21:58
  • See details above :) – user1315621 Oct 11 '21 at 22:02
  • If you are able to make changes on the code side, I would suggest wrapping that logic with a try-catch block and catching a botocore exception. It should be straightforward to extract the data from the error object this way. – rv.kvetch Oct 11 '21 at 22:08
  • This might be helpful: https://stackoverflow.com/questions/587345/regular-expression-matching-a-multiline-block-of-text – Andrew Oct 11 '21 at 22:08

1 Answers1

0

This should work:

query = re.compile(
    rf"^(.*?){re.escape('Error in mycode')}(.*?)\nTraceback((.|\n)*?){re.escape('botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.')}(.*?)\n",
    flags=re.MULTILINE)
Andrew
  • 904
  • 5
  • 17