0

This is a script that I wrote for getting alexa rank.

#!/usr/bin/env python
import sys
import requests
from lxml import html

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print 'usage: python %s <file-urls>' % (sys.argv[0])
        sys.exit(2)

filename = sys.argv[1]
urls = open(filename)
for site in urls:
    try:
        url="http://www.alexa.com/siteinfo/"+site
        content=requests.get(url).content
        tree=html.fromstring(content)
        RANK=tree.xpath('//strong[@class="metrics-data align-vmiddle"]/text()')
        print "Site:",site+"Global Rank:",RANK[0]+"\t"+"Country Rank:",RANK[1]
#        print 'Site:%s Global Rank:%2s Country Rank:%2s' % (site, RANK[0], RANK[1])
    except (KeyboardInterrupt, SystemExit):
        print "Keyboar Interruption!"
        sys.exit(0)

RESULTS:

Site: google.com
Global Rank: 1  Country Rank: 1
Site: yahoo.com
Global Rank: 4  Country Rank: 4
Site: bing.com
Global Rank: 23 Country Rank: 14

The results are not satisfactory. Could you show how the better way to columnize the results?

MLSC
  • 5,872
  • 8
  • 55
  • 89

1 Answers1

0

site contains newline as it is presumably read from a file with one word per line. Strip it before you use it and the newline, along with any whitespace, is gone.

Also consider using string notation rather than string appending.

for site in urls:
    site = site.strip()
    url="http://www.alexa.com/siteinfo/%s" % (site,)
    <..>
danny
  • 5,140
  • 1
  • 19
  • 31