How would I go about reversing the process of Google's AMP api?
I am looking to take an AMP (accelerated mobile page) URL and come up with the regular (original) URL. I was wondering if anyone has the answer as to how to do this in Python (or any other language for that matter)? Any help would be greatly appreciated.
An example:
https://amp.cnn.com/cnn/2018/03/08/politics/jeff-flake-anti-tariff-bill/
Expected output:
https://cnn.com/2018/03/08/politics/jeff-flake-anti-tariff-bill/
A second example:
https://www.google.ca/amp/s/mobile.nytimes.com/2018/03/08/us/politics/trump-tariff-announcement.amp.html
Expected output:
https://www.nytimes.com/2018/03/08/us/politics/trump-tariff-announcement.html
A third (and final) example:
https://www.google.ca/amp/s/www.theverge.com/platform/amp/2018/3/8/17097904/android-ios-smartphone-brand-loyalty
Expected output:
https://www.theverge.com/2018/3/8/17097904/android-ios-smartphone-brand-loyalty
The unfortunate thing is that the implementation of AMP appears to vary considerably. I guess one approach could be to just chop out any "amp" and surrounding dots (.) or slashes (/), however, I could imagine a scenario where that would not be the wisest approach (mainly if the page URL actually was supposed to have amp in its ending etc (and it appeared in regular browsing).