-2

I'm new to python (and coding in general) and am trying to write a script that will scrape all of the <p> tags from a given URL and then create a CSV file with them. It seems to run through okay, but the CSV file it creates doesn't have any data in it. Below is my code:

import requests
r = requests.get('https://seekingalpha.com/amp/article/4420423-chipotle-mexican-grill-inc-s-cmg-ceo-brian-niccol-on-q1-2021-results-earnings-call-transcript')

from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('<p>')

records = []
for result in results:
    Comment = result.find('<p>').text
    records.append((Comment))

import pandas as pd
df = pd.DataFrame(records, columns=['Comment'])
df.to_csv('CMG_test.csv', index=False, encoding='utf-8')
print('finished')

Any help greatly appreciated!

djb113
  • 3
  • 1
  • first you could use `print()` to see what you get in `records`. Maybe it is empty so there is no data to save in file. Modern pages uses JavaScript to add elements and `requests`, `BeautifulSoup` can't run JavaScript so you should check what you get in `r.text` or you should turn off JavaScript in web browser and reload page to see what you can get without JavaScript. – furas May 31 '21 at 17:11
  • The content on the page you linked is not present in the initial source; it's loaded in via JavaScript. `requests` is a pure HTTP client, and `BeautifulSoup` is a pure HTML parser. Neither support the interpretation/execution of JavaScript. As such this is a duplicate of [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – esqew May 31 '21 at 17:14
  • it has to be `soup.find_all('p')` without `< >` – furas May 31 '21 at 17:19

1 Answers1

1

First, you need to pass CSS selectors to BeautifulSoup methods. <p> isn't a selector. p is. So, in order to find all p tags, you need to use the find_all method on the soup like so: results = soup.find_all('p')

Take a look at this page for more info on the CSS selectors.

Secondly, in your iteration over results, you don't need to find the tag all over again. You can simply extract the text by result.text. So, if you rewrite your code like the following:

import requests
r = requests.get('https://seekingalpha.com/amp/article/4420423-chipotle-mexican-grill-inc-s-cmg-ceo-brian-niccol-on-q1-2021-results-earnings-call-transcript')

from bs4 import BeautifulSoup
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('p')

records = []
for result in results:
    Comment = result.text
    records.append(Comment)

import pandas as pd
df = pd.DataFrame(records, columns=['Comment'])
df.to_csv('CMG_test.csv', index=False, encoding='utf-8')
print('finished')

You'll find your csv well-populated with the content you're looking for.

Artin Zamani
  • 179
  • 1
  • 8