Python wget saves a file. how to get data in variable

Question

i am using wget in python as

import wget
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

down = wget.download(url)
print down

and it download the html data into a file. but i want it on variable. i am new in python. any help would be appreciated. thanks in advance

possible duplicate of [What is the quickest way to HTTP GET in Python?](http://stackoverflow.com/questions/645312/what-is-the-quickest-way-to-http-get-in-python) — Buddy, Jun 16 '15 at 16:56
i want to scrap facebook pages and i read it from http://stackoverflow.com/questions/18990597/using-beautifulsoup-to-parse-facebook — Harish, Jun 16 '15 at 17:06

heinst · Accepted Answer · 2015-06-16T17:27:22.530

4

You don't need to use wget to download the HTML to a file then read it in, you can just get the HTML directly. This is using requests (way better than pythons urllibs in my opinion)

import requests
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = requests.get(url).text
print html

This is an example using pythons built in urllib2:

import urllib2
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = urllib2.urlopen(url).read()
print html

Edit

I know see what you mean in the difference between HTML gotten directly from the website vs the HTML gotten from the wget module. Here is how you would do it using the wget module:

import wget
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

down = wget.download(url)

f = open(down, 'r')
htmlText = "\n".join(f.readlines())
f.close()
print htmlText

edited Jun 16 '15 at 17:27

answered Jun 16 '15 at 17:01

heinst

8,520
7
41
77

i have done inspect element and i did not get the exactly text . so http://stackoverflow.com/questions/18990597/using-beautifulsoup-to-parse-facebook i read that we have to use wget – Harish Jun 16 '15 at 17:09
@Harish I see what you mean now...sorry about that my updated answer should be what you want. Also the `wget` module doesnt like having the same file twice. so make sure to always remove the `events` file before running the script, or have the script delete it before downloading it – heinst Jun 16 '15 at 17:27
thanks @heinst but still when i do the scrapping it show the different data because facebook page need to be logged in for all the authorization to access data so. pls help to sort it out. thanks again – Harish Jun 16 '15 at 17:31
@Harish that should be a whole new question and will require a lot more code than what you have now. – heinst Jun 16 '15 at 17:34

Python wget saves a file. how to get data in variable

1 Answers1

Edit