I have a pandas dataframe in which values in a column are used as the group-by basis to create submodels.
import pandas as pd
from sklearn.linear_model import Ridge
data = pd.DataFrame({"Name": ["A", "A", "A", "B", "B", "B"], "Score": [90, 80, 90, 92, 87, 80], "Age": [10, 12, 14, 9, 11, 12], "Training": [0, 1, 2, 0, 1, 2]})
"Name"
is used as the basis to create submodel for each individual. I want o use variable "Age"
and "Training"
to predict "Score"
of one individual "Name"
(i.e "A"
and "B"
in this case). That is, if I have "A"
and know the "Age"
and "Training"
of "A"
, I would love to use "A"
, "Age"
, "Training"
to predict "Score"
. However, "A"
should be used to access to the model that "A"
belongs to other than other model.
grouped_df = data.groupby(['Name'])
for key, item in grouped_df:
Score = grouped_df['Score']
Y = grouped_df['Age', 'Training']
Score_item = Score.get_group(key)
Y_item = Y.get_group(key)
model = Ridge(alpha = 1.2)
modelfit = model.fit(Y_item, Score_item)
modelpred = model.predict(Y_item)
modelscore = model.score(Y_item, Score_item)
print modelscore
Up to here, I have built simple Ridge models to sub-groups A
and B
.
My question is, with test data as below:
test_data = [u"A, 13, 0", u"B, 12, 1", u"A 10, 0"] ##each element, respectively, represents `Name`, `Age` and `Training`
How to feed the data to the prediction models? I have
line = test_data
Name = [line[i].split()[0] for i in range(len(line))]
Age = [line[i].split()[1] for i in range(len(line))]
Training = [line[i].split()[2] for i in range(len(line))]
Y = pd.DataFrame({"Name": Name, "Age": Age, "Training": Training})
This gives me the pandas dataframe of the test data. However, I am not sure how to proceed further to feed the test data to the model. I highly appreciate your help. Thank you!!
UPDATE
After I adopted the code of Parfait, the code looks better now. Here I did not, however, create another pandas dataframe of the testdata (as I am not sure how to deal with row in there). Instead, I feed in the test values by spliting strings. I obtained an error as indicated below. I searched and found a post here Preprocessing in scikit learn - single sample - Depreciation warning which is related. However, I tried to reshape the test data but it is on the list form so it does not have the attribute of reshap. I think I misunderstand. I highly appreciate if you can let me know how to fix this error. Thank you.
import pandas as pd
from sklearn.linear_model import Ridge
import numpy as np
data = pd.DataFrame({"Name": ["A", "A", "A", "B", "B", "B"], "Score": [90, 80, 90, 92, 87, 80], "Age": [10, 12, 14, 9, 11, 12], "Training": [0, 1, 2, 0,$
modeldict = {} # INITIALIZE DICT
grouped_df = data.groupby(['Name'])
for key, item in grouped_df:
Score = grouped_df['Score']
Y = grouped_df['Age', 'Training']
Score_item = Score.get_group(key)
Y_item = Y.get_group(key)
model = Ridge(alpha = 1.2)
modelfit = model.fit(Y_item, Score_item)
modelpred = model.predict(Y_item)
modelscore = model.score(Y_item, Score_item)
modeldict[key] = modelfit # SAVE EACH FITTED MODEL TO DICT
line = [u"A, 13, 0", u"B, 12, 1", u"A, 10, 0"]
Name = [line[i].split(",")[0] for i in range(len(line))]
Age = [line[i].split(",")[1] for i in range(len(line))]
Training = [line[i].split(",")[2] for i in range(len(line))]
for i in range(len(line)):
Name = line[i].split(",")[0]
Age = line[i].split(",")[1]
Training = line[i].split(",")[2]
model = modeldict[Name]
ip = [float(Age), float(Training)]
score = model.predict(ip)
print score
ERROR
/opt/conda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)
86.6666666667
/opt/conda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.DeprecationWarning)
83.5320600273
/opt/conda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.DeprecationWarning)
86.6666666667
/opt/conda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.DeprecationWarning)
[ 86.66666667]
/opt/conda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.DeprecationWarning)
[ 83.53206003]
/opt/conda/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning)
[ 86.66666667]