I have a large data frame with MANY columns. I want to normalize a few columns which are all numeric, and then plot two using regression. I thought the code below would do it for me.
from sklearn import preprocessing
# Create x, where x the 'scores' column's values as floats
modDF = df[['WeightedAvg','Score','Co','Score', 'PeerGroup', 'TimeT', 'Ter', 'Spread']].values.astype(float)
# Create a minimum and maximum processor object
min_max_scaler = preprocessing.MinMaxScaler()
# Create an object to transform the data to fit minmax processor
x_scaled = min_max_scaler.fit_transform(modDF)
# Run the normalizer on the dataframe
df_normalized = pd.DataFrame(x_scaled)
import seaborn as sns
import matplotlib.pyplot as plt
sns.regplot(x="WeightedAvg", y="Spread", data=modDF)
However, I am getting the following error: IndexError: only integers, slices (
:), ellipsis (
...), numpy.newaxis (
None) and integer or boolean arrays are valid indices
I did a regression without normalizing, using sns.regplot
and it worked, but it looked weird, so I want to see it with normalization applied. I know how the regression works. I just don't know how the regression works.