Ignore index column when running fit_transform from sklearn?

Question

New to python here, I'm ultimately trying to create an MDS plot but am running into some issues applying the MDS to my data. Here's some mock data, my actual data set is much larger:

import pandas as pd
from sklearn.manifold import MDS

df = pd.DataFrame({'Gene1': ['43.5', '14.7', '0', '33.9', '89.7'],
                   'Gene2': ['54.5', '3.7', '77.8', '21.9', '8.7'],
                   'Gene3': ['9.5', '0', '65', '1.5', '87.4'],
                   'Tissue': ['--', 'root', 'leaf', 'leaf', 'seed']})
df.set_index('Tissue')

The index for my data is the Tissue column, which describes tissue types for each gene. Here's how I'm trying to apply the MDS:

mds = MDS(2,random_state=0)
df_2d = mds.fit_transform(df)

I get the error could not convert string to float: '--'. How can I ignore the index column to run the MDS on only the gene columns? Or should I remove the Tissue column and add it back in after running MDS on the gene columns?

I think the problem is in your set index call. Check what the dataframe looks like after that. Most calls in python return new dataframes. You need to set your variable to the output (`df = df.set_index(...)`) or use the `inplace` parameter (`df.set_index(..., inplace=True)`). https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.set_index.html — Marmaduke, Nov 08 '21 at 17:04
(though note that [`inplace` is not recommended](https://stackoverflow.com/a/60020384/13138364)) — tdy, Nov 08 '21 at 17:12
Ah yes this seems to fix the issue, after reassigning df to the set_index output the MDS is running — millie0725, Nov 08 '21 at 17:14

Ignore index column when running fit_transform from sklearn?

0 Answers0