0

I do the preprocessing for the data to apply to K-means cluster for time-series data following hour. Then, I normalize the data but it shows the error:

`

Traceback (most recent call last):
  File ".venv\lib\site-packages\pandas\core\series.py", line 191, in wrapper
    raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".venv\timesequence.py", line 210, in <module>
    matrix = pd.DataFrame(scaler.fit_transform(x_calls), columns=df_hours.columns, index=df_hours.index)
  File ".venv\lib\site-packages\sklearn\base.py", line 867, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 420, in fit
    return self.partial_fit(X, y)
  File ".venv\lib\site-packages\sklearn\preprocessing\_data.py", line 457, in partial_fit
    X = self._validate_data(
  File ".venv\lib\site-packages\sklearn\base.py", line 577, in _validate_data
    X = check_array(X, input_name="X", **check_params)
  File ".venv\lib\site-packages\sklearn\utils\validation.py", line 856, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File ".venv\lib\site-packages\pandas\core\generic.py", line 2064, in __array__
    return np.asarray(self._values, dtype=dtype)
ValueError: setting an array element with a sequence.
#--------------------Preprocessing ds 
counter_ = 0
zero = 0
df_hours = pd.DataFrame({
    'Hour': [],
    'SumView':[],
    'CountStudent':[]
}, dtype=object)

while counter_ < 24:
    if (counter_ in sub_data_hour['Hour']):
        row = sub_data_hour.loc[(pd.to_numeric(sub_data_hour['Hour'], errors='coerce')) == counter_]
        df_hours.loc[len(df_hours.index)] = [counter_, row['SumView'], row['CountStudent']]
    else:
        df_hours.loc[len(df_hours.index)] = [counter_, zero, zero]
    counter_ += 1

#----------Normalize dataset------------
x_calls = df_hours.columns[2:]
scaler = MinMaxScaler()
matrix = pd.DataFrame(scaler.fit_transform(df_hours[x_calls]), columns=x_calls, index=df_hours.index) 

`

I did try .to_numpy() or .values or [['column1','column2']] following this post pandas dataframe columns scaling with sklearn

But it did not work. Could anyone please help me to fix this? Thanks.

1 Answers1

0

The problem here is the datatype of df_hours I preprocessed.

Solution: change row['SumView'] to row['SumView'].values[0] and do the same with row['CountStudent'].