I'm looking to extract the programme title and sub-title from the (clipped) XML file below. I was extracting both individually using xmllint and sed and combining them into one file, but I have since discovered that there are the occasional entries that only have a title and no sub-title. In this case I would like to leave sub-title blank. Please could someone suggest a way to account for this discrepancy?
XML File
<programme start="20171013170000 +0100" stop="20171013180000 +0100" channel="b492458d826d592ec7c528545a16c757">
<title lang="eng">Accessories Gift Hall</title>
<sub-title lang="eng">Find the perfect gift with fashion accessories by some of our most sought-after brands. From chic purses and wallets to cosy PJs and slippers, there's something for everyone.</sub-title>
</programme>
<programme start="20171013180000 +0100" stop="20171014130000 +0100" channel="b492458d826d592ec7c528545a16c757">
<title lang="eng">..programmes start again at 1pm</title>
</programme>
<programme start="20171014130000 +0100" stop="20171014140000 +0100" channel="b492458d826d592ec7c528545a16c757">
<title lang="eng">Ruth Langsford's Fashion Edit</title>
<sub-title lang="eng">TV personality and QVC fashion ambassador, Ruth Langsford, shares her favourite looks and must-have pieces that will transform your wardrobe and have you looking fabulously stylish.</sub-title>
</programme>
Bash commands v1
xmllint --xpath "//programme/title" xmltv | sed -r 's/\n//g' | sed 's/<\/title>/\n/g' | sed 's/<title lang="eng">//g' > 1.txt
xmllint --xpath "//programme/sub-title" xmltv | sed -r 's/\n//g' | sed 's/<\/sub-title>/\n/g' | sed 's/<sub-title lang="eng">//g' > 2.txt
paste <(cat 1.txt) <(cat 2.txt) > 3.txt
Thanks!