-2

So I'm super new to learning Python, currently a little over halfway through a heavy Udemy course and wanted to give myself a challenge to apply skills along the way. I scraped and concatenated a dataframe of 10 years worth of fantasy football drafts from the league I'm in and wanted to see if there's a way to predict a winner based off how each team drafts skill position players. I know there's a ton of variables throughout the season (injuries, trades, waiver wire pickups, etc) but I'm doing this just for fun and to hammer home the skills I'm learning about.

The problem I'm having is that the dataframe is a MultiIndex (I believe?) and grouped first by the year, team, pick number, draft choice, and either a 1 or 0 for win or lose. It looks like this:

Database

This is the code I'm using to try and run the model.

from sklearn.model_selection import train_test_split
X = stackedDF.drop(['Win','Team'],axis=1)
y = stackedDF['Win']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3)

from sklearn.linear_model import LogisticRegression
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test)

And I receive the following error in Jupyter:

ValueError: could not convert string to float: ' QB'

I'm guessing this means I'll need to convert each skill position into a number maybe by way of a dictionary? For example {'QB':'1','RB':'2'} etc...

Am I way off here? Hope this isn't a lame question, I'm still super new to this and am exciting to be learning Python. Thanks!

  • Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune May 05 '21 at 00:09
  • Your posted code dies immediately on an undefined symbol. You neglected to include the full error message (traceback) and trace of the offending values. – Prune May 05 '21 at 00:10

1 Answers1

1

Scikit-learn as most other machine learning tools expect a numerical value as an input, because it's very ambiguous how an algorithm should treat a string. So in order to avoid confusion and make your code work in this case the best option is to a one-hot-encode, more details here. The general gist is that your dataframe will be expanded with additional columns for every position type and when a team acquires a position it will hold a value 1, otherwise it will be 0.

Doing it in pandas is simple - use get_dummies method and supply the name of your dataframe and the columns you need to encode:

ff_data = pd.DataFrame()
ff_data = pd.get_dummies(ff_data, columns=['Pick'])

# And then you continue as before\
X = ff_data.drop(['Win'], axis=1)
y = ff_data.Win
# etc
NotAName
  • 3,821
  • 2
  • 29
  • 44