3

This is a spin-off from In Python, how do I split a string and keep the separators?

rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00'

How can I split this rawByteString into parts using "\\!" as the delimiter without dropping the delimiters, so that I get:

[b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00']

I do not want to use [b'\\!' + x for x in rawByteString.split(b'\\!')][1:] as that would use string.split() and is just a workaround, that is why this question is tagged with the "re" module.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
questionto42
  • 7,175
  • 4
  • 57
  • 90
  • @WiktorStribiżew `import re rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00' [x for x in re.split(b'(\\\\!)', rawByteString)][1:]`: `[b'\\!', b'\x00\x00\x00\x00\x00\x00', b'\\!', b'\x00\x00\x00\x00\x00\x00']` which is not what I need, I need `[b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00']` – questionto42 Jun 26 '20 at 09:39
  • 1
    `re.split(rb'(?!\A)(?=\\!)', rawByteString)`, see https://ideone.com/L9n1V9 – Wiktor Stribiżew Jun 26 '20 at 09:42
  • 1
    See if `lst_Bytes = re.split(b'(?<!^)(?=\\\\!)', rawByteString)` works for you – JvdV Jun 26 '20 at 09:43
  • Both suggestions work. – questionto42 Jun 26 '20 at 09:47
  • 2
    @JvdV and Lorenz The patterns are the same, as `(?!\A)` = `(?<!^)` since no `re.M` is passed and `"\\\\"` = `r"\\"` – Wiktor Stribiżew Jun 26 '20 at 09:47
  • @WiktorStribiżew though you have marked it as a similar question, you can add an answer and I will accept it. – questionto42 Jun 26 '20 at 09:50
  • Well, the point right now is that you used `str.split`, not `re.split`. JvdV and I fixed the regex pattern for you, but you might have been able to come to that solution yourself if you used the right pattern from the start. If you edit the question (showing you use the right method and still don't get the expected result) it might get reopened. If it does, I will add the answer. If anyone knows a better duplicate, feel free to change the current duplicate thread link or let me know. – Wiktor Stribiżew Jun 26 '20 at 09:56
  • @WiktorStribiżew I have changed the question so that I do not go for a string.split() solution at all, that would meet your requirement as well. Perhaps it will get reopened, as it is quite a specific case of the duplicate's question. – questionto42 Jun 26 '20 at 10:52
  • 1
    I modified the title so that it shows the difference from the other question. – Wiktor Stribiżew Jun 27 '20 at 10:11
  • OK, I thought this question was also different due to the rawByteString, and still you are right, the solution is independet from the string type, the regex could be applied to any string, thanks. – questionto42 Jun 27 '20 at 10:36

1 Answers1

1

You may use

re.split(rb'(?!\A)(?=\\!)', rawByteString)
re.split(rb'(?!^)(?=\\!)', rawByteString)

See a sample regex demo (the string input changed since null bytes cannot be part of a string).

Regex details

  • (?!^) / (?!\A) / (?<!^) - a position other than start of string
  • (?=\\!) - a position not immediately followed with a backslash + !

NOTES

  • Since you use a byte string, the b prefix is required when defining the pattern string literal
  • r makes the string literal a raw string literal so that we do not have to double escape backslashes and can use \\ to match a single \ in the string.

See Python demo:

import re
rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00'
print ( re.split(rb'(?!\A)(?=\\!)', rawByteString) )

Output:

[b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563