I have an API server built by Python flask. And I need a group of clients/computers to send data over to the API server by making http post request.
The data here is actually html content. (NOTE: I am not turning legit data into HTML/XML format, the data its self is HTML that I have collected form the web) which is usually about 200KB per page. And I am trying to alleviate the network load as much as I can by using serial/deserial and compression.
I am thinking about instead of send raw HTML across the network. Is there any kind of method like Serialize the html object (BeautifulSoup soup?) and deserialize on the server side. Or use some compression method to zip the file first and then post the data to the API server. On the server side, it can decompress the data once it receive the compressed one.
What I have done:
(1) I tried to turn the raw HTML text into a soup object, and then use Pickle to serialize that. However, it told me too many recursions and errorred out. I also tried pickle the raw html and the compression performance is bad... almost the same size as the raw html string.
(2) I tried zlib to compress the file beforehand and then it is 10% the size of its original one. However, is this the legit way to approach this problem?
Any thoughts?