So I'm super new to learning Python, currently a little over halfway through a heavy Udemy course and wanted to give myself a challenge to apply skills along the way. I scraped and concatenated a dataframe of 10 years worth of fantasy football drafts from the league I'm in and wanted to see if there's a way to predict a winner based off how each team drafts skill position players. I know there's a ton of variables throughout the season (injuries, trades, waiver wire pickups, etc) but I'm doing this just for fun and to hammer home the skills I'm learning about.
The problem I'm having is that the dataframe is a MultiIndex (I believe?) and grouped first by the year, team, pick number, draft choice, and either a 1 or 0 for win or lose. It looks like this:
This is the code I'm using to try and run the model.
from sklearn.model_selection import train_test_split
X = stackedDF.drop(['Win','Team'],axis=1)
y = stackedDF['Win']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3)
from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test)
And I receive the following error in Jupyter:
ValueError: could not convert string to float: ' QB'
I'm guessing this means I'll need to convert each skill position into a number maybe by way of a dictionary? For example {'QB':'1','RB':'2'} etc...
Am I way off here? Hope this isn't a lame question, I'm still super new to this and am exciting to be learning Python. Thanks!