-1

I've got a python script that uses mechanize to grab data from a web page. This is working fine but I've done a hack job of then using bash to filter for the text I'm looking for. I now need to do this in the main python script as I need to use the output value.

response = br.submit()
print response.read()

This prints out the response which I then manipulate with bash

| grep usedData | cut -d '"' -f2 | sed 's/\<GB used\>//g'`

How can I do this all in python?

The output from the bash script would be a number (eg 123.45)

Input:

<tr><th>Current Data Usage:  </th><td>221.59GB</td></tr>


Output: 221.59
Greg
  • 1,715
  • 5
  • 27
  • 36

2 Answers2

1

You could use a regex to find all digit-and-period sequences that precede "GB".

>>> import re
>>> s = "<tr><th>Current Data Usage:  </th><td>221.59GB</td></tr>"
>>> match = re.search(r"([\d\.]*)GB", s)
>>> match.group(1)
'221.59'
Kevin
  • 74,910
  • 12
  • 133
  • 166
0

Simply try this:

input_html = "<tr><th>Current Data Usage:  </th><td>221.59GB</td></tr>"
begin = input_html.find("</th><td>")
end = input_html.find("GB</td>")
output = input_html[begin+len("</th><td>"):end]
print output

This should find exactly what you are looking for.

Semih Yagcioglu
  • 4,011
  • 1
  • 26
  • 43