0

I would like the result to be a single list with individual strings, not the current output. Basically it would be the last list with all the strings in one list together. Any help would be appreciated

headers = dict()
headers[
    "User-Agent"
] = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Mobile Safari/537.36"

headlines =[]
pages = np.arange(1, 3)

for page in pages:
    url = 'https://www.marketwatch.com/investing/stock/aapl/moreheadlines?channel=MarketWatch&pageNumber=' + str(page)
    results = requests.get(url, headers=headers)
    soup = bs(results.text, "html.parser")
    contents = soup.find_all("div", class_='article__content')
    for i in contents:
     headline = i.find("h3", class_='article__headline').text.strip()
     headlines.append(headline)
     print(headlines)

Then the output is this:

['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying', 'The Dow Fell 12 Points Because Intel and Apple Stock Softened the Blow']
['The Dow Fell 179 Points Because Fauci Warned About Covid Mutations', 'Apple Inc. stock outperforms competitors on strong trading day', 'Apple Stock Nears Another Record. Analysts’ iPhone Outlooks Keep Going Up.', 'Charting a bull-trend pullback:  S&P 500 retests the breakout point', 'Facebook and Amazon set records in annual spending on Washington lobbying', 'The Dow Fell 12 Points Because Intel and Apple Stock Softened the Blow', 'Apple Inc. stock outperforms market on strong trading day']


  • 1
    Duplicate: https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists – dukkee Jan 26 '21 at 18:01
  • Does this answer your question? [How to make a flat list out of list of lists?](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-list-of-lists) – 273K Jan 26 '21 at 18:06
  • Did you mean to print `headline` instead of `headlines`? – PApostol Jan 26 '21 at 18:06

1 Answers1

0

What happens?

The headlines are all in that list, issue is the indent of your print, it should be outside the loop and print the list only ones.

for page in pages:
    url = 'https://www.marketwatch.com/investing/stock/aapl/moreheadlines?channel=MarketWatch&pageNumber=' + str(page)
    results = requests.get(url, headers=headers)
    soup = bs(results.text, "html.parser")
    contents = soup.find_all("div", class_='article__content')
    for i in contents:
     headline = i.find("h3", class_='article__headline').text.strip()
     headlines.append(headline)

print(headlines)

Btw you can improve your selection like this:

soup = BeautifulSoup(results.text, "html.parser")
for headline in soup.select('div.article__content h3.article__headline'):
    headlines.append(headline.get_text(strip=True))
HedgeHog
  • 22,146
  • 4
  • 14
  • 36