-1

I have data in the following format.I need to strip all the data that appears before <s:Envelope.

HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Server: Microsoft-IIS/10.0
X-Powered-By: ASP.NET
Date: Fri, 05 May 2017 09:52:02 GMT
Content-Length: 338962

<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
    <s:Body><RetrieveStoredRoutesResponse xmlns="http://schema.website.com">

How can I do this using regular expression?

Pradeep
  • 1,193
  • 6
  • 27
  • 44
  • You're using Perl? Just loop through the file and discard all the lines until you find one that starts with that pattern. –  May 05 '17 at 10:12
  • @dan1111 The data is present in a variable which is a respone from webservice.So I need a regex or something else to remove HTTP part – Pradeep May 05 '17 at 10:18
  • This is an HTTP response stream and in general the body may not necessarily start with ` – Dmitry Egorov May 05 '17 at 10:44

1 Answers1

1

This looks like an HTTP response stream of which you need only HTTP response body. In general the body may not necessarily start with <s:Envelope. To get the body you need to strip off HTTP headers. The headers are a series of non-empty lines followed by an empty line with first line starting with HTTP. A Perl regex substitution operator to remove the header is

s/\A(?:^HTTP.*?(?:^.+$)*^$)+//sm;

In this regex:

  • \A matches the start of the entire input (note that ^ is used here to match start of a new line since /m is used)
  • (?: - start of the outer non-capturing group. This group matches a single HTTP header block
    • ^ - start of a line
    • HTTP - HTTP literally
    • .*? - any text in non-greedy fashion (which effectively continues to the end of line due to the following ^)
    • (?: - start of the inner non-capturing group. This group matches a single non-empty line
      • ^ - start of a line
      • .+ - one or more chars (i.e. non-empty line)
      • $ - end of the line
    • ) - end of the inner non-capturing group.
    • * - repeat the the group (a non-empty line) zero or more times
  • ) - end of the outer non-capturing group.
  • + - repeat the group (an HTTP headers block) one or more times

Perl demo: https://ideone.com/LEPpkQ

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40