0

I am trying to scrape class div id="ideas_body" from this site, but it seems to be missing. I have tried the different parsers linked to in this post (Missing parts on Beautiful Soup results), but none have been successful.

Here is my code:

import requests
from bs4 import BeautifulSoup
import lxml

# Set Soup
url = 'https://www.com/ideas#'
headers = {'User-Agent': 'Mozilla/5.0'}
page = requests.get(url, headers=headers)

and the unsuccessful parsers I have tried:

  1. soup = BeautifulSoup(page.content, 'lxml-xml')
  2. soup = BeautifulSoup(page.content, 'html.parser')
  3. soup = BeautifulSoup(page.content, 'html.parser-xml')
  4. soup = BeautifulSoup(page.content, 'html5lib')

So how can I parse this ID in order to scrape it?

user53526356
  • 934
  • 1
  • 11
  • 25

1 Answers1

1

As was mentioned earlier in the comments there is no need to scrape. You just can call an API to get the data you need.

If you need more than 30 results change 'per_page' in form_data.

import requests


form_data = {'type': 'idea',
             'show': 'all',
             'sort': 'new',
             'per_page': 30,
             'gotodate': '04/06/2019',
             'ls': 'all',
             'loc': 'all',
             'marketcap_l': 0,
             'shorten_name': 1
             }

response = requests.post('https://www.valueinvestorsclub.com/messages/loadmsgs', data=form_data)

ideas = response.json()['result']

Hope it helps!

andreilozhkin
  • 495
  • 4
  • 15
  • This is close, though it's the wrong endpoint. Should be ~/ideas/loadideas rather than ~/messages/loadmsgs (and thus different ````form_data````) . – user53526356 Jul 07 '19 at 13:54