0

I downloaded list of news content into pandas dataframe. Instead of putting the info into table, pd put everything into a single cell. Upon inspection, the downloaded string is in this pattern:

"['[{"t": "1", "id": "NOW.976818", "dt": "2019/11/15 10:13", "h": "《美股業績》Nvidia季績勝預期 季度收入預測遜預期", "u": "",...

How to convert this into pd table?

My codes:

urlpull ="http://www.aastocks.com/tc/resources/datafeed/getmorenews.ashx?cat=result-announcement&newstime=942660890&newsid=NOW.976800&period=0&key="
df = pd.DataFrame({'News': ['a'], 'Page': ['1']})
result = requests.get(urlpull)
result.raise_for_status()
result.encoding = "utf-8"
src = result.content
soup = BeautifulSoup(src, 'lxml')

news = []
for a_tag in soup.find_all('p'):
    news.append(a_tag.text)
df = df.append(pd.DataFrame(news, columns=['News']))
print(news)
df['num'] = df['News'].str.extract('(\d{5})')
df["stock_num"] = pd.to_numeric(df["num"], errors="coerce").fillna(0).astype("int64")

print (df)
df.to_excel("News.xlsx")
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Arthur Law
  • 111
  • 7

1 Answers1

0

you can do directly

pd.read_table(filename/url)
SRG
  • 345
  • 1
  • 9
  • Thanks. However, the data only store in a single cell as "[{"t": "1", "id": "NOW.976818", "dt": "2019/11/15 10:13", "h": "《美股業績》Nvidia季績勝預期 季度收入預測遜預期", "u": "", "i": "20180322103037588_s.jpg","s":"HK6", "dtd": "942660792", "bucnt": "0", "becnt": "0", "rcnt": "0", "cv": "1"},{“t": "1", "id": "NOW.976764", "dt": "2019/11/15 09:35", "h": "《公司業績》科地農業(08153.HK)半年虧損收窄至1,811萬元 ", "u": "", "i": "20190911140145440_s.jpg","s":"HK6", "dtd": "942658537", "bucnt": "3", "becnt": "3", "rcnt": "0", "cv": "1"}] . How to put them in table form? – Arthur Law Nov 17 '19 at 14:13