0

I have a dataframe with text and a list of labels that I, when added to the targets column, converted with .astype(str). When trying to send this data to a multi-label machine learning model, I got an error ValueError: too many dimensions 'str'. How do I convert it to a list or use a method from a library?

train_data = pd.DataFrame({'text':[i for i in X_train], 'target_1':[i for i in y_train["target_1"]], 'target_2':[i for i in y_train["target_2"]],
                          'target_3':[i for i in y_train["target_3"]], 'target_4':[i for i in y_train["target_4"]], 'target_5':[i for i in y_train["target_5"]],
                          'target_6':[i for i in y_train["target_6"]]})

train_data['targets'] = train_data[train_data.columns[1:]].apply(lambda x: ', '.join(x.dropna().astype(str)), axis=1)
train_data = train_data.drop(['target_1', 'target_2', 'target_3', 'target_4', 'target_5', 'target_6'], axis=1)
train_data['targets'] = train_data['targets'].str.split(',')
train_data.info()

DataFrame looks like

    text                                               targets
0   добрый день, никита. благодарю вас! добрый ден...   [5.8, 6.2, 6.3, 5.5, 6.0, 5.0]
1   - добрый. напишите андрею кравцову, что мы об...    [6.0, 6.2, 7.0, 5.8, 5.2, 5.0]
2   никита, добрый день. спасибо за доверие и ценн...   [6.2, 6.4, 8.0, 5.5, 6.8, 5.5]

Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   text     171 non-null    object
 1   targets  171 non-null    object
dtypes: object(2)

Error when i convert to float

train_data["targets"].astype(float)
--> 997         return arr.astype(dtype, copy=True)
    998 
    999     return arr.view(dtype)
    ValueError: setting an array element with a sequence.
  • 3
    Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune Jun 22 '21 at 22:12
  • @Prune Thanks for the hint, I have separated parts of the code with an overview of the dataset and where the error occurs. – Гыггыг Фидолобабович Jun 22 '21 at 22:20
  • It's nice to have the post a little more readable -- but please don't tag us until you've completed the upgrades. – Prune Jun 22 '21 at 22:23
  • When converting arrays of `dtype` object to float, you'll get this error if the object elements vary in size. Check the elements of the object dtype array carefully. – hpaulj Jun 22 '21 at 22:53
  • you can't use `series.astype(float)` for an array with list contents, even if they're the same length. You can always do `df['targets'].apply(pd.Series)` – Michael Delgado Jun 22 '21 at 23:35

1 Answers1

0

I fixed it with list

train_data["targets"] = [list(map(float, target)) for target in train_data["targets"]]