Determine weights in multivariate weighted linear regression

Question

I have a dataset containing insurance pricing and coverage information. The first column refers to the policy identifier, and the remaining columns refer to premium, limit, deductible, and further details as dummy variables (State and coverage).

Identifier	Price	Limit	Deductible	Peril1	Peril2	Peril3	Peril4	Peril5	Peril6	State1
POL1	250.0	100000	500.0	1	1	1	0	0	1	1
POL1	625.0	100000	1000.0	1	1	1	0	0	1	1
POL1	1650.0	500000	1000.0	1	1	1	0	0	1	1
POL1	2500.0	1000000	1000.0	1	1	1	0	0	1	1
POL1	4375.0	2000000	2000.0	1	1	1	0	0	1	1
POL2	25.0	50000	500.0	0	0	1	1	0	0	1
POL3	60.25	25000	500.0	1	1	1	1	1	1	1
POL3	73.25	50000	500.0	1	1	1	1	1	1	1

Moreover, as it can be seen from the sample dataframe, several rows can refer to the same insurance product. In the original data frame, up to 40 rows may refer to a single policy, while other policies are described in a single row.

I am trying to conduct a multivariate regression

reg <- lm(log(Premium) ~ Limit + Deductible + Peril1 + Peril2 + Peril3 + Peril4 + Peril5 + Peril6 + State1+ State2 + State3 + State4, data=df)

By conducting the multivariate regression, it emerges that the distribution of residual errors does not follow a normal distribution. I therefore decided to Log() the dependent variable. Moreover, in my dataframe there are several outliers and presence of heteroscedasticity.

For the reasons above I thought WLS regression could be a solution to my problem, because it can help me assigning an appropriate weight to each error term. Trying to understand the functioning and theory behind WLS I tried to conduct simple weighted regression as explained here

wt <- 1 / lm(abs(reg$residuals) ~ reg$fitted.values)$fitted.values^2  
wls_model <- lm(log(Premium) ~ Limit + Deductible + Peril1 + Peril2 + Peril3 + Peril4 + Peril5 + Peril6 + State1+ State2 + State3 + State4, data=df, weight=wt)

But when looking at the results I don’t think this is the correct approach to tackle my problem, also considering the fact that by trying to solve this issue many rows are not considered.

From my understand, as the weight parameter of lm should be a vector, I could assign a specific weight to each policy. For instance, each row pertaining POL1 is 1/5. Despite having read documentation, relevant posts, and searched for packages that could facilitate my work, it is not clear to me how to implement WLS in my case.

[Please do not upload images of code/data/errors.](//meta.stackoverflow.com/q/285551) Please provide example data easily usable by others. — Anonymous coward, Dec 30 '22 at 17:37
Why not use a robust regression model that calculates the weights for you, like `wtmod <- MASS::rlm(log(Premium) ~ Limit + Deductible + Peril1 + Peril2 + Peril3 + Peril4 + Peril5 + Peril6 + State1+ State2 + State3 + State4, data=df, method="MM")`, then you can see the weights with `wtmod$w`. — DaveArmstrong, Dec 30 '22 at 17:43

Determine weights in multivariate weighted linear regression

0 Answers0