I'm new to Python and the relative question I read didn't make much sense to me. I have the following issue. I want to use Python to do multiple regression and I am trying statsmodels. In this case I want to do a scatter plot.
Sample of my data:
ID order V1 V2 E1 E2 E3 M
103 1 ECA TEXT 7 3 5 7
105 1 ECA TEXT 3 7 4 5
107 1 ECA TEXT 7 7 7 4
109 1 ECA TEXT 6 6 6 3
I want to do a multiple regression with E1-E3 as my IVs and the mean score of M as my DV.
This is how I loaded my data.
myRegressionData = pd.read_csv('C:/Users/user/Desktop/Folder 1/Python/Regression data file.csv')
These are my x and y:
X_sk = myRegressionData[[col for col in myRegressionData.columns if col[:8] == 'E']]
Y = myRegressionData[['M{}'.format(ii) for ii in range(1, 19)]]
y = np.mean(Y, axis=1)
and this the code where I get the error:
myRegressionData.plot(kind='scatter',x = X_sk, y=np.mean(Y, axis=1))
returns
ValueError: Must pass DataFrame with boolean values only
myRegressionData.info()
returns
RangeIndex: 90 entries, 0 to 89 Columns: 146 entries, IDOpenEndedResponse to EngagingAA dtypes: float64(10), int64(134), object(2) memory usage: 102.7+ KB