Filtering output in python?

Question

I've got a python script that uses mechanize to grab data from a web page. This is working fine but I've done a hack job of then using bash to filter for the text I'm looking for. I now need to do this in the main python script as I need to use the output value.

response = br.submit()
print response.read()

This prints out the response which I then manipulate with bash

| grep usedData | cut -d '"' -f2 | sed 's/\<GB used\>//g'`

How can I do this all in python?

The output from the bash script would be a number (eg 123.45)

Input:

<tr><th>Current Data Usage:  </th><td>221.59GB</td></tr>


Output: 221.59

you can pipe using python via stdin parameter ; http://stackoverflow.com/questions/9655841/python-subprocess-how-to-use-pipes-thrice — Ali SAID OMAR, Sep 01 '15 at 12:21

score 1 · Accepted Answer · answered Sep 01 '15 at 12:29

1

You could use a regex to find all digit-and-period sequences that precede "GB".

>>> import re
>>> s = "<tr><th>Current Data Usage:  </th><td>221.59GB</td></tr>"
>>> match = re.search(r"([\d\.]*)GB", s)
>>> match.group(1)
'221.59'

answered Sep 01 '15 at 12:29

Kevin

74,910
12
133
166

Thanks very much - I like this solution. Seems a lot easier than my bash script :) – Greg Sep 01 '15 at 12:35

Semih Yagcioglu · Answer 2 · 2015-09-01T12:34:09.357

0

Simply try this:

input_html = "<tr><th>Current Data Usage:  </th><td>221.59GB</td></tr>"
begin = input_html.find("</th><td>")
end = input_html.find("GB</td>")
output = input_html[begin+len("</th><td>"):end]
print output

This should find exactly what you are looking for.

edited Sep 01 '15 at 12:34

answered Sep 01 '15 at 12:28

Semih Yagcioglu

4,011
1
26
43

Please don't use `str` as a variable. If you do so, you won't be able to use the `str()` function. – Paco Sep 01 '15 at 12:32
Sorry to tell you that, but `input` is also a function in python ;-) – Paco Sep 01 '15 at 12:33
I just fixed that too ;) – Semih Yagcioglu Sep 01 '15 at 12:35

Filtering output in python?

2 Answers2