0

I would like support trying to get a linear regression model built by 'ClientID' grouping.

I have managed to pull together the below functions but I am getting errors, that said I am very new to Python so would be open to any other ideas you may have.

The data below is from my dataframe dfTotal3 and the two variables I am using Steps (count of steps the client did in a day) and WeightDiff (change in weight since last input).

def model(dfTotal3, target):
    y = dfTotal3[['Steps']].values
    X = dfTotal3[['WeightDiff']].values
    return np.squeeze(LinearRegression().fit(X, y).predict(target))

def group_predictions(df, target):
    target = dfWeightComp['DTWDG']
    return dfTotal3.groupby('ClientID').apply(model, target)

group_predictions(dfTotal3, dfTotal3['DTWDG'])

Error Output:

ValueError: Expected 2D array, got 1D array instead:
array=[-0.03231707 -0.03780488 -0.04512195 -0.04615385].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
jjsantoso
  • 1,586
  • 1
  • 12
  • 17
Aodhan
  • 5
  • 3
  • `X = X.reshape(1, -1)` is probably what you want before calling fit. – Chrispresso May 11 '23 at 19:27
  • Thank you for replying, I've thrown this in: def model(dfTotal3, target): y = dfTotal3[['Steps']].values X = dfTotal3[['WeightDiff']].values X = X.reshape(1, -1) return np.squeeze(LinearRegression().fit(X, y).predict(target)) returned the following error: ValueError: Found input variables with inconsistent numbers of samples: [1, 9] – Aodhan May 11 '23 at 19:35

1 Answers1

0

I think the error comes from:

return np.squeeze(LinearRegression().fit(X, y).predict(target))

Specifically I speculate there is an issue with this part of the code .predict(target)

This post is suggesting to change your target i.e.:

target = dfWeightComp['DTWDG']

to:

target = dfWeightComp[['DTWDG']]

Simon
  • 1,201
  • 9
  • 18
  • Thanks Simon, I do get a warning is this something I should be worried about? UserWarning: X has feature names, but LinearRegression was fitted without feature names warnings.warn( – Aodhan May 11 '23 at 19:37
  • It means that when you defined your independent variable for training the model (`X = dfTotal3[['WeightDiff']].values`) you only used values from the DataFrame, where in your `target` variable you used the whole column. For consistency purposes you can change your target to `target = dfWeightComp[['DTWDG']].values` or vice versa and remove the `.values` from your `X` and `y` – Simon May 11 '23 at 19:40
  • Really helpful Simon, thanks (apologies if it was a stupid question!) – Aodhan May 11 '23 at 19:50
  • Hi, The output looks like the following: ClientID AB00001 [22378.285131195335, 21297.06927617608, 21315.... AB00002 [20483.8305898491, 18159.833358735963, 18199.3... AB00003 [22623.58584474885, 19964.634731382826, 20009.... AB00005 [15635.423214285713, 14758.849920443217, 14773... I was expecting one number by ClientID, I am also struggling to access the lists as it has no column name? - Any help would be greatly appreciated – Aodhan May 12 '23 at 11:47