this is my first time asking a question here and I'm a newbie so I'm sorry if my question sounds stupid for some. I am working on streaming data from a machine:
requests.get('http://IP:port/sample?interval=0&heartbeat=1000', stream = True)
and I am receiving data in XML. This the structure of the XML data :
b'--9bc1ad19bf9e3b4049ab7e4f78dda451'
b'Content-type: text/xml'
b'Content-length: 15560'
b'<?xml version="1.0" encoding="UTF-8"?>'
b'<MTConnectStreams xmlns:m="urn:mtconnect.org:MTConnectStreams:1.3" xmlns="urn:mtconnect.org:MTConnectStreams:1.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:mtconnect.org:MTConnectStreams:1.3 http://www.mtconnect.org/schemas/MTConnectStreams_1.3.xsd">'
b' <Header creationTime="2016-12-01T17:58:48Z" sender="MAZATROL-PC" instanceId="1480604825" version="1.3.0.17" bufferSize="131072" nextSequence="1301" firstSequence="1" lastSequence="42044"/>'
b' <Streams>'
b' <DeviceStream name="Mazak" uuid="Mazak">'
b' <ComponentStream component="Controller" name="controller" componentId="cont">'
b' <Samples>'
b' <AccumulatedTime dataItemId="yltime" timestamp="2016-12-01T15:45:15.662995Z" name="total_time" sequence="1214" subType="x:TOTAL">3104040</AccumulatedTime>'
b' <AccumulatedTime dataItemId="yltime" timestamp="2016-12-01T15:46:16.452858Z" name="total_time" sequence="1243" subType="x:TOTAL">3104101</AccumulatedTime>'
b' <AccumulatedTime dataItemId="yltime" timestamp="2016-12-01T15:47:17.331808Z" name="total_time" sequence="1272" subType="x:TOTAL">3104162</AccumulatedTime>'
b' <PathFeedrateOverride dataItemId="pfo" timestamp="2016-12-01T15:33:27.042482Z" name="Fovr" sequence="899" subType="ACTUAL">0</PathFeedrateOverride>'
b' <PathFeedrateOverride dataItemId="pfr" timestamp="2016-12-01T15:30:26.700817Z" name="Frapidovr" sequence="803" subType="RAPID">0</PathFeedrateOverride>'
b' <PathFeedrateOverride dataItemId="pfr" timestamp="2016-12-01T15:30:42.685031Z" name="Frapidovr" sequence="810" subType="RAPID">0</PathFeedrateOverride>'
I am only interested in getting some information from the lines that contain dataItemId. I did this just to print the data :
for line in r.iter_lines():
if b'dataItemId' in line:
print(line)
Knowing that speed is really crucial since we want to have real time data accessible on an AWS database. I am lost on how I should parse in the best way. From what I found, using XmlPullParser is the best way to parse streaming data without blocking. However, I don't know what should the 'start' and 'end' be. I am really lost on how I should proceed without losing any data and guaranteeing that I am parsing everything. I was thinking about having a thread that receives the data, another one that parses the data using XmlPullParser, once the data is put on json format and sent, the line is deleted from the tree. But since I don't have a tree structure with child nodes if I want to only parse the lines that have dataItemId, I'm not seeing clearly how it should work. Your help is highly appreciated. Thank you