0

I have data in a dataframe that I'm trying to plot. After about day 4 or so the data causes the plot to get wavy like it shows in the picture. Any idea on how I could smooth it out? Here is my code:

(I know its not the prettiest...)

control_temp = pd.read_excel(r'Downloads/controltempt.xlsx')
control_temp = control_temp.drop(index = 0)
control_temp = control_temp.drop(labels = 'Unnamed: 2', axis = 1)
control_temp = control_temp.drop(labels = 'Unnamed: 4', axis = 1)
control_temp = control_temp.drop(labels = 'Unnamed: 5', axis = 1)
control_temp = control_temp.drop(labels = 'Unnamed: 6', axis = 1)
# control_temp = control_temp.drop(control_temp.index[range(1975,3591)])
control_temp['Time'] = pd.to_datetime(control_temp['Time'])
control_temp = control_temp.set_index('Time').resample('12H').first()
control_temp = control_temp.dropna()
control_temp = control_temp.resample('S')
control_temp = control_temp.interpolate(method='cubic')

plt.plot(control_temp.index, control_temp['Unnamed: 3'], c = 'green')


wc4_temp = pd.read_excel(r'Downloads/wc4.xlsx')
wc4_temp = wc4_temp.drop(index = 0)
wc4_temp = wc4_temp.drop(labels = 'Unnamed: 2', axis = 1)
wc4_temp = wc4_temp.drop(labels = 'Unnamed: 4', axis = 1)
wc4_temp = wc4_temp.drop(labels = 'Unnamed: 5', axis = 1)
wc4_temp = wc4_temp.drop(labels = 'Unnamed: 6', axis = 1)
# wc4_temp = wc4_temp.drop(wc4_temp.index[range(1609,2807)])
wc4_temp['Time'] = pd.to_datetime(wc4_temp['Time'])
wc4_temp = wc4_temp.set_index('Time').resample('12H').first()
wc4_temp = wc4_temp.dropna()
wc4_temp = wc4_temp.resample('S')
wc4_temp = wc4_temp.interpolate(method='cubic')

plt.plot(wc4_temp.index, wc4_temp['Unnamed: 3'], c = 'blue')


wc48_temp = pd.read_excel(r'Downloads/wc48.xlsx')
wc48_temp = wc48_temp.drop(index = 0)
wc48_temp = wc48_temp.drop(labels = 'Unnamed: 2', axis = 1)
wc48_temp = wc48_temp.drop(labels = 'Unnamed: 4', axis = 1)
wc48_temp = wc48_temp.drop(labels = 'Unnamed: 5', axis = 1)
wc48_temp = wc48_temp.drop(labels = 'Unnamed: 6', axis = 1)
# wc48_temp = wc48_temp.drop(wc48_temp.index[range(1158,2570)])
wc48_temp['Time'] = pd.to_datetime(wc48_temp['Time'])
wc48_temp = wc48_temp.set_index('Time').resample('12H').first()
wc48_temp = wc48_temp.dropna()
wc48_temp = wc48_temp.resample('S')
wc48_temp = wc48_temp.interpolate(method='cubic')

plt.plot(wc48_temp.index, wc48_temp['Unnamed: 3'], c = 'red')
fig = plt.figure(1, figsize = (10,5))
date_format = mpl_dates.DateFormatter('%d')
plt.gca().xaxis.set_major_formatter(date_format)
plt.minorticks_on()
plt.tick_params(which = 'minor', direction = 'in', top = True, right = True)
plt.tick_params(which = 'major', direction = 'in', top = True, right = True)
plt.xlabel("Days")
plt.ylabel("Temperature (\u00b0C)")
plt.title('Temperature over 4 Days')
plt.ylim(0,32)
plt.legend(bbox_to_anchor = [.99,.28], labels = ['Control','.4 W/C', '.48 W/C'])
plt.show()

Here is the plot

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • 3
    Perhaps you could take a [rolling average](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html)? – Nick ODell Jul 23 '21 at 03:37

1 Answers1

0

Well there are many different options to smooth your data. If you are constantly getting more data and you might want an online algorithm to smooth out your data. In this case an exponential moving average (EMA). Here is some code to compute the EMA.

If you have a fixed dataset, so no new data coming in, you could for example apply an average filter using convolution.

import numpy as np 
x = np.random.randint(0,10,(100,))
kernel = np.ones(5)/5
smoothed = np.convolve(kernel,x)

Further there are a whole range of more application specific methods to smooth data.

sehan2
  • 1,700
  • 1
  • 10
  • 23