2

When I run my crawler it fetches the results as list. However, I expected to have that in regular string being displayed in two columns. Thanks for any suggestion.

import requests
from lxml import html

url="http://www.wiseowl.co.uk/videos/"
def Startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    Title= tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/text()")
    Link=tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/@href")
    print(Title,Link)

Startpoint(url)

Having results like this: enter image description here

But, I expected the output like: enter image description here

SIM
  • 21,997
  • 5
  • 37
  • 109

3 Answers3

2

Your Title and Link actually don't contain a single element, but both contain lists of all the titles and links respectively (those XPath expressions match multiple elements).

So in order to get a list of title, link pairs, you need to zip() them together:

pairs = zip(titles, links)

Once you got that, you can iterate over those pairs using a for loop, and print the items left justified so you get your columns:

print('{:<70}{}'.format(title, link))

(See this answer for details on how to print left aligned items).


Everything together:

import requests
from lxml import html

url = "http://www.wiseowl.co.uk/videos/"


def startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    titles = tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/text()")
    links = tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/@href")
    pairs = zip(titles, links)

    for title, link in pairs:
        # Replace '70' with whatever you expect the maximum title length to be
        print('{:<70}{}'.format(title, link))

startpoint(url)
Community
  • 1
  • 1
Lukas Graf
  • 30,317
  • 8
  • 77
  • 92
1

Try iterating over both list sequentially, like this:

import requests
from lxml import html

url="http://www.wiseowl.co.uk/videos/"
def Startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    Title= tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/text()")
    Link=tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/@href")
    for i,j in zip(Title, Link):
        print('{:<70}{}'.format(i,j))

Startpoint(url)
Shashank
  • 1,105
  • 1
  • 22
  • 35
  • Wow!!! This is what I was expecting. Thanks sir Shashank, for such a great solution. Usage of those triple "t" is vague to me, though! Forgive my ignorance. Gonna accept you answer in a while. – SIM May 11 '17 at 20:10
  • Oh yes, Actually your output was having some spaces so I added that, sorry for that :P because I was trying to do some string manipulation but I failed. but still the output is correct – Shashank May 11 '17 at 20:11
1

You can loop over each link and print the title and url.

import requests
from lxml import html

url="http://www.wiseowl.co.uk/videos/"
def Startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    links = tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a")
    for link in links:
        print('{title:<70}{url}'.format(title=link.text, url=link.attrib.['href']))

Startpoint(url)
Håken Lid
  • 22,318
  • 9
  • 52
  • 67