I downloaded list of news content into pandas dataframe. Instead of putting the info into table, pd put everything into a single cell. Upon inspection, the downloaded string is in this pattern:
"['[{"t": "1", "id": "NOW.976818", "dt": "2019/11/15 10:13", "h": "《美股業績》Nvidia季績勝預期 季度收入預測遜預期", "u": "",...
How to convert this into pd table?
My codes:
urlpull ="http://www.aastocks.com/tc/resources/datafeed/getmorenews.ashx?cat=result-announcement&newstime=942660890&newsid=NOW.976800&period=0&key="
df = pd.DataFrame({'News': ['a'], 'Page': ['1']})
result = requests.get(urlpull)
result.raise_for_status()
result.encoding = "utf-8"
src = result.content
soup = BeautifulSoup(src, 'lxml')
news = []
for a_tag in soup.find_all('p'):
news.append(a_tag.text)
df = df.append(pd.DataFrame(news, columns=['News']))
print(news)
df['num'] = df['News'].str.extract('(\d{5})')
df["stock_num"] = pd.to_numeric(df["num"], errors="coerce").fillna(0).astype("int64")
print (df)
df.to_excel("News.xlsx")