I do the preprocessing for the data to apply to K-means cluster for time-series data following hour. Then, I normalize the data but it shows the error:
`
Traceback (most recent call last):
File ".venv\lib\site-packages\pandas\core\series.py", line 191, in wrapper
raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".venv\timesequence.py", line 210, in <module>
matrix = pd.DataFrame(scaler.fit_transform(x_calls), columns=df_hours.columns, index=df_hours.index)
File ".venv\lib\site-packages\sklearn\base.py", line 867, in fit_transform
return self.fit(X, **fit_params).transform(X)
File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 420, in fit
return self.partial_fit(X, y)
File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 457, in partial_fit
X = self._validate_data(
File ".venv\lib\site-packages\sklearn\base.py", line 577, in _validate_data
X = check_array(X, input_name="X", **check_params)
File ".venv\lib\site-packages\sklearn\utils\validation.py", line 856, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File ".venv\lib\site-packages\pandas\core\generic.py", line 2064, in __array__
return np.asarray(self._values, dtype=dtype)
ValueError: setting an array element with a sequence.
#--------------------Preprocessing ds
counter_ = 0
zero = 0
df_hours = pd.DataFrame({
'Hour': [],
'SumView':[],
'CountStudent':[]
}, dtype=object)
while counter_ < 24:
if (counter_ in sub_data_hour['Hour']):
row = sub_data_hour.loc[(pd.to_numeric(sub_data_hour['Hour'], errors='coerce')) == counter_]
df_hours.loc[len(df_hours.index)] = [counter_, row['SumView'], row['CountStudent']]
else:
df_hours.loc[len(df_hours.index)] = [counter_, zero, zero]
counter_ += 1
#----------Normalize dataset------------
x_calls = df_hours.columns[2:]
scaler = MinMaxScaler()
matrix = pd.DataFrame(scaler.fit_transform(df_hours[x_calls]), columns=x_calls, index=df_hours.index)
`
I did try .to_numpy()
or .values
or [['column1','column2']] following this post pandas dataframe columns scaling with sklearn
But it did not work. Could anyone please help me to fix this? Thanks.