how to get the text of the url while scraping webpage

Question

I want to scrape only the content from the data <meta itemprop="url" content="http://www.vestiairecollective.com/women-bags/handbags/chanel/black-timeless-leather-handbag-chanel-2668779.shtml"> i.e only the http part. But the way I am doing it, gets me the the the whole data as a result starting from "meta".

Here is my script logic:-

import urllib.request
from bs4 import BeautifulSoup
url=urllib.request.urlopen("http://www.vestiairecollective.com/women-bags/handbags/#_=catalog")
soup=BeautifulSoup(url.read(),"html.parser")
getdata=soup.find_all("div",{"class":"expand-snippet-container"})

for i in getdata:
data1=i.find_all("meta",{"itemprop":"url"})
datac=[da[0] for da in data1]
print(datac1)    


for i in getdata:
    data1=i.find_all("p",{"class":"brand"})
    datac1=[da.contents[0] for da in data1]
    brdata=("\n".join(datac1))

    if brdata=="CHANEL":
        da1=i.find_all("meta",{"itemprop":"url"})
        print(da1)

In the last print statement, I need only the url to show (example http://www.vestiairecollective.com/women-bags/handbags/chanel/black-timeless-leather-handbag-chanel-2668779.shtml . What am I doing wrong? Please help.

Thank you I referred the content in the link provided. But still cant get my answer. I changed my last two lines of code as `da=i.find(attrs={"meta":"content"}) op=da['value'] ` and still giving error. Where am i still wrong? — Ro_nair, May 27 '16 at 11:07
The result from `find_all()` is a list. If you know it'll contain just 1 ** tag, you could just `print(da1[0]['content'])`, I think. — Ilja Everilä, May 27 '16 at 11:09
No, still not getting. any other way? To be specific, what should be in the place of "name" and "stainfo" inside attrs? — Ro_nair, May 27 '16 at 11:18
What *name* and *stainfo*, you don't have such variables in your code example. I fetched the page in question and run most of your code, esp. the "last print". `print(da1[0]['content'])` worked as expected. — Ilja Everilä, May 27 '16 at 11:24
Sorry, "name" and "info" variables were in the link which you provided. In my code they are "meta" and "content" . So you got the required output? Can you share the code you tried with? I mainly want the last 3 lines — Ro_nair, May 27 '16 at 11:32
@IljaEverilä Hello there, I understand, it might be a duplicate question but I still did not get the required output. Please share ur last 3 lines as an answer please — Ro_nair, May 27 '16 at 12:55
See the previous comment. Replace `print(da1)` with `print(da1[0]['content'])`, unless you've made other changes in the mean time. — Ilja Everilä, May 27 '16 at 12:58
No other changes are made. I pasted the same print statement too. I mainly want the 2nd last line where we write `da1 = i.findAll(attrs={"meta" : "content"})` — Ro_nair, May 27 '16 at 13:01
Well that's a change right there, your example was `da1=i.find_all("meta",{"itemprop":"url"})` originally. Your latter call tries to find any element with an attribute "meta" with value "content", which obviously is not what you want. — Ilja Everilä, May 27 '16 at 13:03
Refering the link which you sent, I changed the previous line of the print statement to this. where they show how to use "attrs" to get attribute value. To avoid any further confusions, Please share the entire code u tried as an answer so that I can mark it — Ro_nair, May 27 '16 at 13:06

how to get the text of the url while scraping webpage

0 Answers0