1

I have the regular expression

(GET|POST) (/api/\w+) (HTTP/1\.\d)(?:.*\\r\\n\\r\\n)(\S+)?

which I'm trying to match against HTTP GET and HTTP POST requests. I'm using the helpful regex101.com website to format my regular expression, and according to it, the regular expression should match both the formats I'm seeking.

Here's my regular expression on regex101.com.

However, when I input into Python itself and call re.split(), (on an input of strings), it doesn't split the POST request. It only splits the GET request. I thought it had something to do with the way regex101 parses \r\n (CRLF) versus how Python does it, so I double-checked and made sure that in Python, I actually type in \r\n\ inside the regex, and not \\r\\n, as I did in regex101. Yet it still doesn't work.

How can I get the regular expression to work inside Python?

Alureon
  • 179
  • 1
  • 3
  • 14

1 Answers1

1

Your'e just missing an additional \r\n after HTTP/1.0. This will work:

'POST /api/gettime HTTP/1.0\r\n\r\nContent-Length: 13\r\n\r\n100000+200000'
codebee
  • 824
  • 1
  • 9
  • 22
  • OK, I see that now. Good catch. But I actually do need to have the POST format have only one ```\r\n``` after ```HTTP/1.0```. So is there a way I can use one regex to handle both ```\r\n\``` and ```\r\n\r\n```? I'm fiddling around with the regular expression right now, but am not having much luck. – Alureon Oct 05 '19 at 00:45
  • 1
    @WaterGuy `(\r\n)+` will match one or more `\r\n` or `(\r\n){1,2}` to only match 1 or 2 occurrences – Nick Oct 05 '19 at 01:41
  • @Nick I gave that a shot, and yet it didn't work. I tried ```(?:\\r\\n)+``` and yet it only matched one \r\n. – Alureon Oct 05 '19 at 02:58
  • 1
    @WaterGuy It's working for me on your regex101 page: https://regex101.com/r/Ghlo0L/4 – Nick Oct 05 '19 at 03:02
  • Do you need to preserve the `\r`s and `\n`s? If not, you could pre-process the string to replace them with say, a space. I suspect you're re-inventing the wheel here. Have you considered using an existing http parsing library? https://stackoverflow.com/questions/4685217/parse-raw-http-headers – MCI Oct 05 '19 at 17:33
  • @MCI Unfortunately, usage of any HTTP parsing library is disallowed. Yes, I do need to preserve the ```\r```s and ```\n```s. – Alureon Oct 06 '19 at 05:03
  • @Nick Hmm, it does seem to be working now. That's strange, it wasn't working before. Maybe I made a typo somewhere. – Alureon Oct 06 '19 at 05:31
  • 1
    @WaterGuy chalk it up to the ghost in the machine. Glad to hear it's working. – Nick Oct 06 '19 at 06:21