Unable to Convert scraped data from list to regular string

Question

When I run my crawler it fetches the results as list. However, I expected to have that in regular string being displayed in two columns. Thanks for any suggestion.

import requests
from lxml import html

url="http://www.wiseowl.co.uk/videos/"
def Startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    Title= tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/text()")
    Link=tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/@href")
    print(Title,Link)

Startpoint(url)

Having results like this:

But, I expected the output like:

score 2 · Answer 1 · edited May 23 '17 at 11:47

Your Title and Link actually don't contain a single element, but both contain lists of all the titles and links respectively (those XPath expressions match multiple elements).

So in order to get a list of title, link pairs, you need to zip() them together:

pairs = zip(titles, links)

Once you got that, you can iterate over those pairs using a for loop, and print the items left justified so you get your columns:

print('{:<70}{}'.format(title, link))

(See this answer for details on how to print left aligned items).

Everything together:

import requests
from lxml import html

url = "http://www.wiseowl.co.uk/videos/"


def startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    titles = tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/text()")
    links = tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/@href")
    pairs = zip(titles, links)

    for title, link in pairs:
        # Replace '70' with whatever you expect the maximum title length to be
        print('{:<70}{}'.format(title, link))

startpoint(url)

Thanks sir Lukas Graf, your answer solves the problem too. – SIM May 11 '17 at 20:18 — SIM, May 11 '17 at 20:18

Shashank · Accepted Answer · 2017-05-11T20:15:51.533

1

Try iterating over both list sequentially, like this:

import requests
from lxml import html

url="http://www.wiseowl.co.uk/videos/"
def Startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    Title= tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/text()")
    Link=tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a/@href")
    for i,j in zip(Title, Link):
        print('{:<70}{}'.format(i,j))

Startpoint(url)

edited May 11 '17 at 20:15

answered May 11 '17 at 20:05

Shashank

1,105
1
22
35

Wow!!! This is what I was expecting. Thanks sir Shashank, for such a great solution. Usage of those triple "t" is vague to me, though! Forgive my ignorance. Gonna accept you answer in a while. – SIM May 11 '17 at 20:10
Oh yes, Actually your output was having some spaces so I added that, sorry for that :P because I was trying to do some string manipulation but I failed. but still the output is correct – Shashank May 11 '17 at 20:11

score 1 · Answer 3 · answered May 11 '17 at 20:10

1

You can loop over each link and print the title and url.

import requests
from lxml import html

url="http://www.wiseowl.co.uk/videos/"
def Startpoint(links):
    response = requests.get(links)
    tree = html.fromstring(response.text)
    links = tree.xpath("//p[@class='woVideoListDefaultSeriesTitle']/a")
    for link in links:
        print('{title:<70}{url}'.format(title=link.text, url=link.attrib.['href']))

Startpoint(url)

answered May 11 '17 at 20:10

Håken Lid

22,318
9
52
67

Thanks sir Håken Lid, for you answer. It also solves the problem. – SIM May 11 '17 at 20:17

Unable to Convert scraped data from list to regular string

3 Answers3