0

I'm carrying out a feature importance analysis on some weather data to predict solar radiation. How to get the feature names to appear in the summarization and graph instead of a number (e.g. Feature 1)?

# linear regression feature importance
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot

import matplotlib.pyplot as plt

# define dataset
df = pd.read_csv (r'D:\ABC Masters Diss\Penarth 1 year.csv')
print (df)
df[:5]

df.info()

#Features
X = df[["Cloud Coverage", "Probability of Precipitation", "Dry Bulb Temp", "Dew Point Temp", "Relative Humidity", "Apparent Temp", "Part of Day",]]

#Target
Y = df["Solar Radiation"]

# define the model
model = LinearRegression()

# fit the model
model.fit(X, Y)

# get importance
importance = model.coef_

# summarize feature importance
for i,v in enumerate(importance):
    print('Feature: %0d, Score: %.5f' % (i,v))
    
# plot feature importance
plt.figure(figsize=(10,10))
plt.bar([x for x in range(len(importance))], importance)
plt.xlabel('Features', fontsize=15)
plt.ylabel('Importance', fontsize=15)
plt.show()
desertnaut
  • 57,590
  • 26
  • 140
  • 166

0 Answers0