Get value of log file in python from variable

Question

I am getting the source code of page in one variable.

<!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>.  </body></html>

I want to extract t1.304.log out of above line. I am using print log_name.split(".log",1)[0] but it is fetching me the first whole part.

Can you elaborate what you mean by extracting the desired string out of the line? Do you want to extract any string that looks like "something.log"? — Leo, Sep 27 '15 at 20:42
yes any string which ends with .log. and it will come only once — Aquarius24, Sep 27 '15 at 20:43
By "only once", do you mean only the first matching substring? Or do you want to make sure the string only contains one match? — Leo, Sep 27 '15 at 20:45

score 3 · Answer 1 · answered Sep 27 '15 at 20:51

Why don't parse the HTML with an HTML parser?

>>> from bs4 import BeautifulSoup
>>> data = "<!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>.  </body></html>"
>>> BeautifulSoup(data).a["href"].split("=")[-1]
't1.304.log'

dstudeba · Answer 2 · 2015-09-27T20:50:26.173

1

If you just want to do it in a quick way you can use the split() function documented here.

log_name.split("'")[1].split("=")[1]

However to do it in a reusable way look into a tool like beautifulsoup

Edited to add

Based on your comments you could do this:

print(log_name.split(".log",1)[0].rsplit("=",1)[1] + ".log")

edited Sep 27 '15 at 20:50

answered Sep 27 '15 at 20:45

dstudeba

8,878
3
32
41

that is not a string, i am taking value from source code – Aquarius24 Sep 27 '15 at 20:59
import urllib url = 'http://www.google.com" logfile = urllib.urlopen(url) logfile = logfile.read() logfile= logfile.split(".log",1)[0].rsplit("=",1)[1] + ".log") – Aquarius24 Sep 27 '15 at 21:00

score 0 · Answer 3 · answered Sep 27 '15 at 20:49

0

   import re
    st = " <!DOCTYPE html><html><head><title>Intro</title></head><body><a href='/name=t1.304.log'>Test</a>.  </body></html>"

    mo = re.search('(t\S*log)', st)

    print(mo.group())

output

t1.304.log

answered Sep 27 '15 at 20:49

LetzerWille

5,355
4
23
26

score 0 · Answer 4 · edited May 23 '17 at 12:06

You could use a regular expression (with the re module), assuming your string variable is page_source:

>>> import re
>>> re.findall('.*=(.*.log)', page_source)
['t1.304.log']

This gives you a list of all matching "*.log" substrings.

But, be warned, apparently it is not advisable to use regular expressions to parse HTML - see this discussion.

In fact, don't do this, use alecxe's answer.

Get value of log file in python from variable

4 Answers4