Slice from the data (bytes) in between two strings in python

Question

I have the data in the bytes type from the request body like the following:

b'0\x80\x06\t*\x86H\x86\xf7\r\x01\x07\x02\xa0\x800\x80\x02\x01\x011\x0b0\t\x06\x05+\x0e\x03\x02\x1a\x05\x000\x80\x06\t*\x86H\x86\xf7\r\x01\x07\x01\xa0\x80$\x80\x04\x82\x04H<?xml version="1.0" encoding="UTF-8"?>\n<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">\n<plist version="1.0">\n<dict>\n<key>PayloadContent</key>\n<dict>\n <key>URL</key>\n<string>***</string>\n <key>DeviceAttributes</key>\n<array>\n<string>UDID</string>\n<string>DEVICE_NAME</string>\n <string>VERSION</string>\n<string>PRODUCT</string>\n<string>MAC_ADDRESS_EN0</string>\n <string>IMEI</string>\n <string>ICCID</string>\n </array>\n</dict>\n<key>PayloadOrganization</key>\n<string>Flybuilds</string>\n<key>PayloadDisplayName</key>\n<string>Device Information (UDID)</string>\n<key>PayloadVersion</key>\n<integer>1</integer>\n<key>PayloadUUID</key>\n<string>*****</string>\n<key>PayloadIdentifier</key>\n<string>******</string>\n<key>PayloadDescription</key>\n<string>Knowing the UDID of my iOS device</string>\n<key>PayloadType</key>\n<string>Profile Service</string>\n</dict>\n</plist>\n\x00\x00\x00\x00\x00\x00\xa0\x82\n@0

Is it possible to extract the data between '<?xml version" and "/plist>" and write to a file in python. (We need to extract the xml part from the bytes data)

Of course it's possible. Get the indexes, and use `variable[firstindex:secondindex]` — Barmar, May 10 '21 at 20:04
The indexes will not be the same for each request. Is it possible with anything like starts and ends with ? — ams_py, May 10 '21 at 20:07
That's why I said to *get* the indexes. You can use the `find()` function to search a string for a substring and return its index. — Barmar, May 10 '21 at 20:07
https://stackoverflow.com/questions/4666973/how-to-extract-the-substring-between-two-markers — Barmar, May 10 '21 at 20:09

score 0 · Answer 1 · answered May 10 '21 at 21:03

Of course it's possible to extract it knowing the start and end signature of the content you want.

#stream is the variable holding the raw data stream (bytes)
#not repeated here for brevity 
start_signature = b'<?xml'
stop_signature = b'</plist>'
xml_start = stream.find(start_signature)
xml_stop = stream.find(stop_signature) + len(stop_signature)
xml_data = stream[xml_start:xml_stop]

While I think this answers the implied question of 'how' to find the data given the start and end, the downside of this solution is that if the xml changes, the script may break. This concern may not be an issue if you know the data will be consistent each time.

If you can learn the meaning of the other bytes in the data you would likely be able to determine the start position and length of the xml without having to know the precise xml contents.

Slice from the data (bytes) in between two strings in python

1 Answers1