1

I want to sum my data to have one number (which is the sum of all the minutes) per day

my data look like that:

    Date              negative_sentiment  positive_sentiment  neutral_sentiment  compount_sentiment
2015.03.22.13.00            1.407692            3.655128          54.937179            3.698333
2015.03.22.13.01            1.839572            3.457345          54.702827            2.742424
2015.03.22.13.02            1.852847            3.187877          54.959512            2.649846
2015.03.22.13.03            1.758206            3.444771          54.762926            3.495089
2015.03.22.13.04            1.611731            3.274262          55.114041            2.847284
2015.03.22.13.05            1.833436            3.241374          54.907794            2.881480

and the format is:

    Date                  datetime64[ns]
negative_sentiment           float64
positive_sentiment           float64
neutral_sentiment            float64
compount_sentiment           float64
dtype: object

I tried many option but nothing is working:

import pandas as pd

pd.set_option('display.width', 1000)

path_name = "C:/Users/Alex/Desktop/03_2015.csv"
data_sentimental = pd.read_csv(path_name, sep=';', header=None, names = ['Date', 'negative_sentiment', 'positive_sentiment','neutral_sentiment','compount_sentiment'])
# converting column 1 to datetime and assigning it back to column 1
data_sentimental['Date'] =  pd.to_datetime(data_sentimental['Date'], format='%Y.%m.%d.%H.%M')
print(data_sentimental.dtypes) #giving us the type of data so we can be sure that we have the good type

data_sentimental = pd.DatetimeIndex(data_sentimental['Date']).normalize()
data_sentimental = data_sentimental.groupby(data_sentimental['Date'].dt.normalize())

but that give me this error:

Traceback (most recent call last):
  File "C:/Users/Alex/PycharmProjects/master_thesis/result.py", line 19, in <module>
    data_sentimental = data_sentimental.groupby(data_sentimental['Date'].dt.normalize())
  File "C:\Users\Alex\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\indexes\datetimelike.py", line 267, in __getitem__
    raise ValueError

thank you for your help

Cavalier
  • 43
  • 1
  • 6

1 Answers1

0

I found a solution

df = data_sentimental
df = df.reset_index().set_index('Date').resample('1D').mean()
df = df.drop('index' , 1 )

Thanks for your help

Cavalier
  • 43
  • 1
  • 6
  • if you do the `reset_index(drop=True)` in the second line you will not need the third line. Do you even really need the `reset_index`? – Maarten Fabré Jul 14 '17 at 09:58