1

I am relatively a freshman for python. I just learnt how to identify urls in a webpage using python. However, now I want to extract the data from the chart in the webpage.
http://index.baidu.com/?tpl=trend&word=%D0%CB%D2%B5%D6%A4%C8%AF

I have three questions for which I need opinions.

  1. It requires login-in to see the webpage. (username:18521057966; pw:saifmf)
  2. Cannot find the data from the source code (html I am assuming)
  3. If we can find which part is the chart, how can we extract the data.
Sam Cao
  • 21
  • 1

1 Answers1

0
  1. Use Selenium with Python bindings. I recommend this because the page uses JavaScript to complete the login.
  2. If the information appears on the page, then it is available to you too. In other words, if the browser can see the information (which it can if it's rendering it), then you can see it too. It is likely in the source code. Use Google chrome, hover over the element you wish to examine, right click on it, and then select "Inspect element." This will bring up the inspector. Even if something isn't available in the source code, the inspector (ctrl+shift+i) can see it.
  3. That depends. I would first recommend getting that far. Once you've found the info in the inspector, you can select the element and get the text using selenium and then output it in whatever form you wish (build a CSV for instance). This question discusses getting text from an element further.
Community
  • 1
  • 1
Bee Smears
  • 803
  • 3
  • 12
  • 22