1

When I execute the script, the result is empty. Why? The script connected with a site and parse html tag <a>:

#!/usr/bin/python3

import re
import socket
import urllib, urllib.error
import http.client
import sys

conn = http.client.HTTPConnection('www.guardaserie.online');
headers = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
                "Content-type": "application/x-www-form-urlencoded; charset=UTF-8" }
params = urllib.parse.urlencode({"s":"hannibal"})
conn.request('GET', '/',params, headers)
response = conn.getresponse();

site = re.search('<a href="(.*)" class="box-link-serie">', str(response.read()), re.M|re.I)
if(site):
  print(site.group())
l'L'l
  • 44,951
  • 10
  • 95
  • 146
faserx
  • 317
  • 1
  • 4
  • 17
  • 1
    Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Lex Scarisbrick Aug 04 '16 at 18:14

1 Answers1

1

It's likely the pattern you are searching for is non-existent in the read response, or it chokes at some point trying to parse html.

re.search( 'href="(.*)" class="box-link-serie"', str(response.read()), re.M | re.I )

Using something more generic or another parser method will likely lead you to your desired result.

l'L'l
  • 44,951
  • 10
  • 95
  • 146
  • If you tried the pattern above it should return a result. I would recommend you try using these imports: `import re, httplib, socket, urllib, sys`, and change `params = urllib.urlencode`, as well as `conn = httplib.HTTPConnection` ... – l'L'l Aug 04 '16 at 18:30
  • the pattern return the entire html page – faserx Aug 04 '16 at 18:33
  • the result is always that – faserx Aug 04 '16 at 18:39
  • I get `href="http://www.guardaserie.online/ray-donovan-a/" class="box-link-serie"` when using `print(site.group())` ... python code here : https://gist.github.com/anonymous/43026f7262b2fddfb7643169f0d558b2 – l'L'l Aug 04 '16 at 18:40
  • [See comment #4](http://stackoverflow.com/questions/38774213/http-request-and-regex-in-python/38774564?noredirect=1#comment64921304_38774564). – l'L'l Aug 04 '16 at 20:44
  • the result is always that – faserx Aug 05 '16 at 08:25
  • I solved the problem by using beautiful soap to make the parser – faserx Aug 05 '16 at 21:05
  • I believe you mean the second problem was solved by using beautiful soup instead. The first problem, which the original question asks about in regards to the blank output, was solved by my answer and suggestions. – l'L'l Aug 06 '16 at 07:37