I took a very dirty xml file to study a little sed. Behold here:
<title><![CDATA[O BR-Linux está em pausa por tempo indeterminado]]></title>
<title><![CDATA[Funçoes ZZ atinge maioridade: versão 18.3]]></title>
<title><![CDATA[CloudFlare 1.1.1.1 e parceria com Firefox DoH]]></title>
<title><![CDATA[Slint, Distro Baseada no Slackware e Acessível]]></title>
<title><![CDATA[Utilização de CPU em sistemas Linux multi-thread]]></title>
<title><![CDATA[Realidade Aumentada com 10 anos de idade e 10 linhas de código.]]></title>
I managed to remove the garbage, and just keep the text. However, the solution did not please me very much. I would like a way to improve this, but I really don't know how. Here is the code:
#!/bin/bash
# Trauvin
URL=http://br-linux.org/feed/
lynx -source "$URL" |
grep '<title><!' | # get tag title
sed 's/<[^!>]*>//g' | # remove tag title
sed 's/<[^<]>*//g' | # remove <!
sed 's/CDATA/''/g' | # remove CDATA
sed 's/[[^[]//g' | # remove the square brackets start
sed 's/[]*]]//g' | # remove the squre brackets end
sed 's/>*//g' | # remove > end
head -n 5
I used several sed's for no more confusion, so I can add comments on all lines.