-1

I was working on a python script to automatically extract ratings from imdb, only I am unable to extract the numbers from my result.

from pattern.web import URL
from pattern.web import plaintext
from pattern.web import decode_utf8
import re

def scrape_imdb(film):
    url = URL (film)
    s=url.download()
    decode_utf8(url.download(s))
    regels=re.compile(('"ratingValue">[0-9].[0-9]'))
    rating= regels.findall(s)
    rating2= rating[0:1]
    rating3= rating2.findall("[0-9"])

    regels2=re.compile ("<title>.*</title>")
    titel=regels2.findall(s)
    print titel, rating2

But this gives me an error. Anyone know what I'm doing wrong?

Shifu
  • 2,115
  • 3
  • 17
  • 15

2 Answers2

3

As you wrote in a comment to another answer:

I still get: AttributeError: 'list' object has no attribute 'findall'

So this seems to be your problem. re.findall returns a list of matches, so rating is a list. When you then do rating2 = rating[0:1], you assign a sublist to rating2, so rating2 itself is a list too (with a single element though). A list does not have a findall method so this fails.

What you probably want to do is run another regular expression on the first result in rating:

rating = regels.findall(s)
rating2 = rating[0] # only get the first element; a string
rating3 = re.findall("[0-9]", rating2)
poke
  • 369,085
  • 72
  • 557
  • 602
0

I believe you have a typo here:

rating3= rating2.findall("[0-9"])

It should be:

rating3= rating2.findall("[0-9]")
Matt Busche
  • 14,216
  • 5
  • 36
  • 61
Rahul Banerjee
  • 2,343
  • 15
  • 16
  • Even when i correct the error, I still get: AttributeError: 'list' object has no attribute 'findall' – Shifu Feb 18 '13 at 22:00