Please suggest me where my mistakes are.
Your single mistake is trying to parse XML with regular expressions. You can't parse XML/HTML with RegEx! Please use an XML/HTML-parser like xidel instead.
The first <item>
-element-node (not "variable" as you call them):
$ xidel -s "https://news.ycombinator.com/rss" -e '//item[1]' \
--output-node-format=xml --output-node-indent
<item>
<title>Show HN: I made an Ethernet transceiver from logic gates</title>
<link>https://imihajlov.tk/blog/posts/eth-to-spi/</link>
<pubDate>Sun, 18 Dec 2022 07:00:52 +0000</pubDate>
<comments>https://news.ycombinator.com/item?id=34035628</comments>
<description><a href="https://news.ycombinator.com/item?id=34035628">Comments</a></description>
</item>
$ xidel -s "https://news.ycombinator.com/rss" -e '//item[1]/description'
<a href="https://news.ycombinator.com/item?id=34035628">Comments</a>
Note that while the output of the first command is XML, the output for the second command is ordinary text!
With the integrated EXPath File Module you could then save this text(!) to an HTML-file:
$ xidel -s "https://news.ycombinator.com/rss" -e '
//item/file:write-text(
replace(title,"[<>:"/\\\|\?\*]",())||".html", (: remove invalid characters :)
description
)
'
But you can also save it as proper HTML by parsing the <description>
-element-node and using file:write()
instead:
$ xidel -s "https://news.ycombinator.com/rss" -e '
//item/file:write(
replace(title,"[<>:"/\\\|\?\*]",())||".html",
parse-html(description),
{"indent":true()}
)
'
$ xidel -s "Show HN I made an Ethernet transceiver from logic gates.html" -e '$raw'
<html>
<head/>
<body>
<a href="https://news.ycombinator.com/item?id=34035628">Comments</a>
</body>
</html>