2

i am using wget in python as

import wget
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

down = wget.download(url)
print down

and it download the html data into a file. but i want it on variable. i am new in python. any help would be appreciated. thanks in advance

Thomas Orozco
  • 53,284
  • 11
  • 113
  • 116
Harish
  • 425
  • 7
  • 22
  • possible duplicate of [What is the quickest way to HTTP GET in Python?](http://stackoverflow.com/questions/645312/what-is-the-quickest-way-to-http-get-in-python) – Buddy Jun 16 '15 at 16:56
  • 2
    Why are you using `wget` then? Why not use `requests`? – Daniel Roseman Jun 16 '15 at 16:57
  • i want to scrap facebook pages and i read it from http://stackoverflow.com/questions/18990597/using-beautifulsoup-to-parse-facebook – Harish Jun 16 '15 at 17:06

1 Answers1

4

You don't need to use wget to download the HTML to a file then read it in, you can just get the HTML directly. This is using requests (way better than pythons urllibs in my opinion)

import requests
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = requests.get(url).text
print html

This is an example using pythons built in urllib2:

import urllib2
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = urllib2.urlopen(url).read()
print html

Edit

I know see what you mean in the difference between HTML gotten directly from the website vs the HTML gotten from the wget module. Here is how you would do it using the wget module:

import wget
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

down = wget.download(url)

f = open(down, 'r')
htmlText = "\n".join(f.readlines())
f.close()
print htmlText
heinst
  • 8,520
  • 7
  • 41
  • 77
  • i have done inspect element and i did not get the exactly text . so http://stackoverflow.com/questions/18990597/using-beautifulsoup-to-parse-facebook i read that we have to use wget – Harish Jun 16 '15 at 17:09
  • @Harish I see what you mean now...sorry about that my updated answer should be what you want. Also the `wget` module doesnt like having the same file twice. so make sure to always remove the `events` file before running the script, or have the script delete it before downloading it – heinst Jun 16 '15 at 17:27
  • thanks @heinst but still when i do the scrapping it show the different data because facebook page need to be logged in for all the authorization to access data so. pls help to sort it out. thanks again – Harish Jun 16 '15 at 17:31
  • @Harish that should be a whole new question and will require a lot more code than what you have now. – heinst Jun 16 '15 at 17:34