0

This is a rather more simpler task which has been troubling me for some time. I have the following:

homepage=$(curl "https://example.com/")

xmlstarlet --quiet fo --html <<<"$homepage" |
xmlstarlet sel -T -t \
    -m "//*[@id='financial']/tbody/tr/td" \
        --if 'not(starts-with(a//@href,"http"))' \
          -o 'https://example.com' \
        --break \
        -v 'a//@href' \
        -o '/?start=1' \
        -o '&' \
        -o 'end=2' -n | \ 
            sed '${/^$/d;}' \ 
                >> "results.txt"

What I want to do is remove the last newline produced by xmlstarlet in -o 'end=2' -n | \. When it reaches the end if the link list it still produces a -n (newline) as if it where to continue adding more links, but actually I want to avoid the last -n instance respective to the last href.

My sed '${/^$/d;}' \ that should do this returns the following error:

sed: ${/^$/d;}: No such file or directory
sed:  : No such file or directory

It somehow does not pipe the previous STDOUT to the sed STDIN correctly. In one of my prior questions I worked with something similar and this sed command worked for me earlier:

sed 's/\\&amp;/\&/g'

On the other hand, I also tried using:

# The -e flag
sed -e '${/^$/d;}'

Which did not work for me either.

Can this be done directly from XMLStarlet without having to add an extra sed pipe?

What is wrong with my sed? What is the correct sed method?

Ava Barbilla
  • 968
  • 2
  • 18
  • 37
  • 1
    Remove all whitespaces after \ – Cyrus Oct 21 '17 at 09:46
  • Hi @Cyrus I don't completely understand. According to this [question](https://stackoverflow.com/questions/369758/how-to-trim-whitespace-from-a-bash-variable) I found that `sed -e 's/[[:space:]]*$//'` should remove the trailing whitespace. This, however, does not remove the **newline** after the last URL. Could you perhaps provide an example? – Ava Barbilla Oct 21 '17 at 14:39
  • 1
    Your code contains these two lines: `-o 'end=2' -n | \ ` and `sed '${/^$/d;}' \ `. Both contain a whilespace after the \. Remove those whitespaces. – Cyrus Oct 21 '17 at 14:43
  • That did the trick @Cyrus . Now the `sed` doesn't return `No such file or directory`, however, it still doesn't remove the trailing **newline**. I have a similar issue to this [thread](https://unix.stackexchange.com/questions/228505/how-to-delete-the-newline-before-eof-in-a-text-csv-file-via-bash), whenever I execute `printf "%s" "$( results.txt` after my initial `XMLStarlet` script finishes, it does actually remove the last newline. This is an extra step which I would like to omit. I try to pipe `printf '%s'`, but the file is empty. I also tried `sed '$d'`, but does not work. – Ava Barbilla Oct 21 '17 at 15:49

1 Answers1

0

Placing this at the end of the script worked for me:

printf "%s" "$(</results.txt)" > results.txt

I was looking for doing this directly in XMLStarlet, hence this is a provisional answer.

Ava Barbilla
  • 968
  • 2
  • 18
  • 37