-1

I've just recently learned the basics of web development. However, my pandas dataframe is only presenting the first and last column when there are 4 columns. Here is the program:


import requests

import pandas as pd

r= requests.get('https://www.nytimes.com/interactive/2017/06/23/opinion/trumps-lies.html')

from bs4 import BeautifulSoup
soup=BeautifulSoup(r.text,'html.parser')
results=soup.find_all('span',attrs={'class':'short-desc'})

print(len(results))

print(results[0:3])

first_result=results[0]


print(first_result.find('strong'),'\n')
print(first_result.find('strong').text,'\n')


print(first_result.find('strong').text+', 2017','\n\n')


print('\t\tExtracting The Lie\n')


print(first_result.contents[1][0:-1],'\n\n')


print('\t\tExtracting The Explanation\n')

print(first_result.contents[2],'\n')

print(first_result.find('a'),'\n')

print(first_result.find('a').text[1:-1],'\n\n')


print('\t\tExtracting The URL\n')

print(first_result.find('a')['href'],'\n')

print('\t\tBuilding a Dataset\n')

records=[]
for result in results:
    date=result.find('strong').text[0:-1]+', 2017'
    lie=result.contents[1][1:-2]
    explanation=result.find('a').text[1:-1]
    url=result.find('a')['href']
    records.append((date,lie,explanation,url))

print(len(records))
print(records[0:3],'\n\n')
df=pd.DataFrame(records,columns=['date','lie','explanation','url'])


print(df.head())

Everthing works as supposed to except for the pandas. The first five columns comes out as this:

date  ...                                                url
0  Jan. 21, 2017  ...  https://www.buzzfeed.com/andrewkaczynski/in-20...
1  Jan. 21, 2017  ...  http://nation.time.com/2013/11/06/10-things-yo...
2  Jan. 23, 2017  ...  https://www.nytimes.com/2017/01/23/us/politics...
3  Jan. 25, 2017  ...  https://www.nytimes.com/2017/01/21/us/politics...
4  Jan. 25, 2017  ...  https://www.nytimes.com/2017/01/24/us/politics...

I am using pycharm and the version of the pandas is 1.0.4. Why do the '...' come instead of the text?

2 Answers2

1

The ... is just an abbreviation to be able to better display it. The actual values don't have the ellipsis.

To verify that you can print out the first row with df.iloc[0].

Ahmad
  • 69,608
  • 17
  • 111
  • 137
1

Your data is still there. It's just to accomodate to long columns

See here

>>> print(df.head())
            date                                                lie                                        explanation                                                url
0  Jan. 21, 2017  I wasn't a fan of Iraq. I didn't want to go in...   He was for an invasion before he was against it.  https://www.buzzfeed.com/andrewkaczynski/in-20...
1  Jan. 21, 2017  A reporter for Time magazine — and I have been...  Trump was on the cover 11 times and Nixon appe...  http://nation.time.com/2013/11/06/10-things-yo...
2  Jan. 23, 2017  Between 3 million and 5 million illegal votes ...             There's no evidence of illegal voting.  https://www.nytimes.com/2017/01/23/us/politics...
3  Jan. 25, 2017  Now, the audience was the biggest ever. But th...  Official aerial photos show Obama's 2009 inaug...  https://www.nytimes.com/2017/01/21/us/politics...
4  Jan. 25, 2017  Take a look at the Pew reports (which show vot...            The report never mentioned voter fraud.  https://www.nytimes.com/2017/01/24/us/politics...
>>> df.iloc[0, 1]
"I wasn't a fan of Iraq. I didn't want to go into Iraq."
Balaji Ambresh
  • 4,977
  • 2
  • 5
  • 17