1

I have two Datasets.

The first one, in the market variable contains a generic market trend with the following structure:

Date     High    Close     Volume     Open      Low

The second, in the moods variable contain for each days a few tweets with an assosiate sentiment in this structure:

body       date            datetime         id sentiment      time

So, I want to count for each days how many "Bearish" and "Bullish" sentiment there are. It works and this is my code with comments:

# Read the datasets
market = pd.read_csv("Datasets/SP500/aggregates.txt")
moods = pd.read_json("Datasets/DatasetStockTwits-Aggregato.json")
# Remove all null sentiments
moods = moods[moods.sentiment != "null"]
# Get a generic subsets of data for computational speed
market_tail = market.tail(100)
# For each day present in market_tail, get the same days twits
moods_tail = moods.loc[moods['date'].isin(market_tail.Date)]
# So now I count for each day how many "Bearish" and "Bullish" twits there are
sentiments_count = pd.crosstab(moods_tail['date'], moods_tail['sentiment'])

print(sentiments_count)

This is the results:

sentiment   Bearish  Bullish
date                        
2017-11-03        9       12
2017-11-05        3        6
2017-11-06       20        9
2017-11-07       16       35

So it work fine, but I don't understand why I cannot access to sentiments_count.date or sentiments_count['date'] index.

In fact if I try somethings like this:

print(sentiments_count['date'])

I obtain: KeyError: 'date'

Am I missing somethings? Thanks

Malik Asad
  • 441
  • 4
  • 15
Aso Strife
  • 1,089
  • 3
  • 12
  • 31

1 Answers1

3

You cannot select it, because it is index, so need:

print(sentiments_count.index)

For create column from index need reset_index, also for data cleaning is possible add rename_axis for remove column name sentiment:

sentiments_count = sentiments_count.reset_index().rename_axis(None, 1)

print(sentiments_count['date'])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252