0

Here is what I'm trying to accomplish:

  1. Using python mechanize I open a site
  2. If content does not match my regex I open another site
  3. I perform searching using another regex

And the extracted code:

m = re.search('<td>(?P<alt>\d+)', response.read())
...
m = re.search('<td>(?P<alt>\w+)', response.read())
print m.group('alt')

I'm getting:

AttributeError: 'NoneType' object has no attribute 'group'

If I uncomment the second search everything is fine. I don't understand this behaviour.

Such an error redirected me to this stackoverflow issue and to this - but to no avail - neither of these solved my problem.

I don't care about efficiency here so I don't use compile.

Community
  • 1
  • 1
laszchamachla
  • 763
  • 9
  • 21
  • What is the unfiltered result of each response.read()? I'm betting the second read isn't returning what you expect. – cmaynard Feb 07 '11 at 17:38
  • Could you add some more details about what you are trying to do by calling re.search twice? The current example code makes no sense. – shang Feb 07 '11 at 17:45
  • @kramthegram - thanks! You're right. It wasn't regex issue. @shang - because response.read() changes beetween these 2 lines - vide second point of my question. – laszchamachla Feb 07 '11 at 17:48

1 Answers1

2

Assuming response is a file-like object, calling read a second time might return a empty string as you consumed the file before.

data = response.read()
m = re.search('<td>(?P<alt>\d\d*)', data)
m = re.search('<td>(?P<alt>\d\d*)', data)
print m.group('alt')

Why would you call search multiple times?

Reiner Gerecke
  • 11,936
  • 1
  • 49
  • 41
  • You're right - thanks! So it wasn't regex issue. My mistake. I would like call search multiple times, because data might change between these two lines (second point of my question). – laszchamachla Feb 07 '11 at 17:48
  • @laszchamachla In that case, I don't see how this is any help. If I understand you correctly, you're getting page A, search on its data, in case of no matches, you do a new request and search on that data. There shouldn't be a problem if between two searches, you issue a new request and get a new response. – Reiner Gerecke Feb 07 '11 at 17:55
  • @Reiner - exactly, it is pretty strange to me too. But, as you adviced, asigning response.read() to variable before every search solves the problem. – laszchamachla Feb 07 '11 at 18:03
  • Also I'd suggest to compile the regex once: `rx = re.compile('(?P\d\d*)')` and then re-use it wherever needed: `m = rx.search(data)`. – 9000 Feb 07 '11 at 18:04
  • @9000 - I wrote: "I don't care about efficiency here so I don't use compile." - it is not the point in this case, but thanks for your suggestion. – laszchamachla Feb 07 '11 at 18:06
  • @laszchamachla: besides efficiency, there's maintainability: you only need to change the regexp once if you find a bug in it. but you can just use a string constant, of course :) – 9000 Feb 07 '11 at 18:27
  • @9000 - Thanks, I know that :) It was only an example - in fact I use different regexes to aforementioned searches. – laszchamachla Feb 07 '11 at 18:33