Hi when I try to right click and save page as the web page is saved as .xhtml . But when I try to download it using wget or curl it is being downloaded as .html. Is there any way that I can download it as it is like .xhtml? I really need it.
-
1What's the difference between the downloaded `.html` and the `.xhtml` files, besides their extension? – zb226 Oct 01 '15 at 09:27
-
Have you tried curl? – AKRAM EL HAMDAOUI Oct 01 '15 at 09:28
-
@zb226 Hi i personally don't know what the difference is, but I have a strange problem here. I am doing a project using yad --html widget. The input to html --uri is a html page. Thing is it is only working when I manually download the page as xhtml and then rename it to html and then input that html to --uri. I dont know why and i dont want to know why. I just want make that manual process using commands. This is my rss url : http://www.rt.com/rss/news/ – jeevanreddymandali Oct 01 '15 at 09:38
-
@AkramElHamdaoui Yeah no use – jeevanreddymandali Oct 01 '15 at 09:38
3 Answers
You are downloading an RSS feed. This is not an (X)HTML document, but its own type of XML document. Your browser is displaying an (X)HTML representation of the RSS feed XML. If you click "Save as" in the browser, it saves that representation to disk. If you're running wget
/curl
against the RSS feed's URL, you're downloading its XML file. Every browser may choose a different representation for RSS feeds. There is no way to emulate that with just wget
/curl
.
Update 1: You need a software which translates the RSS feed XML into XHTML (that is, XML of type A into XML of type B). This is done with XSLT (Extensible Stylesheet Language Transformations). There is no obvious or "correct" solution, as the target representation can be chosen freely by writing an arbitrary XSL stylesheet. Be aware that this is not particularly easy. Depending on the technology stack you're using, there might also be prefabricated solutions available. Try googling for "rss to xhtml" or similar.
Update 2: To get you started, do the following:
- Install
xsltproc
(should be available in your package manager, but: download, sources) - Save the stylesheet below to
rss2xhtml.xsl
wget -O - -o /dev/null "http://www.rt.com/rss/news/" | xsltproc rss2xhtml.xsl /dev/stdin > out.xhtml
- ...and presto, there's your HTML
The provided stylesheet is very basic, customize as you wish, if you feel like learning this stuff :)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- http://stackoverflow.com/a/32884376/1529709 -->
<xsl:output method="html" indent="yes"/>
<xsl:template match="text()"></xsl:template>
<xsl:template match="item">
<h2><a href="{link}"><xsl:value-of select="title"/></a></h2>
<p><xsl:value-of select="description" disable-output-escaping="yes"/></p>
</xsl:template>
<xsl:template match="/rss/channel">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html>
<head>
<title><xsl:value-of select="title"/></title>
<style>img,p {display:block;float:none;}</style>
</head>
<body>
<h1><a href="{link}"><xsl:value-of select="title"/></a></h1>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

- 9,586
- 6
- 49
- 79
You can do this by adding -O parameters:
wget -O centos-org.xhtml https://www.centos.org
Or you can try to do this using cURL
curl https://www.centos.org > centos-org.xhtml

- 1,818
- 4
- 22
- 27
-
Yeah i tried those, no use. :( . I have a strange problem here. I am doing a project using yad --html widget. The input to html --uri is a html page. Thing is it is only working when I manually download the page as xhtml and then rename it to html and then input that html to --uri. I dont know why and i dont want to know why. I just want make that manual process using commands. This is my rss url : http://www.rt.com/rss/news/ – jeevanreddymandali Oct 01 '15 at 09:37
-
This is the output when i open the output file using your methods : XML Parsing Error: mismatched tag. Expected: . Location: file:///home/jeevan/centos-org.xhtml Line Number 21, Column 3: --^ – jeevanreddymandali Oct 01 '15 at 09:46
Afaik the only difference is the extension.
wget http://website.com/index.html && mv index.html index.xhtml

- 2,179
- 1
- 11
- 27
-
Try manually downloading this (http://www.rt.com/rss/news/) and then rename it to .html and then feed to yad --html --browser --uri="file.html". And then download it using some commands and rename it it and feed it to yad. Output is not proper for 2nd one. :( I really need it – jeevanreddymandali Oct 01 '15 at 09:41
-
-
-
To be honest, i don't know, i tried to answer a question about downloading a html page as xhtml. It turns out to be an RSS feed, i have 0 knowledge on RSS feeds. – x13 Oct 01 '15 at 09:50