1

I'm trying to create a multiple linear regression model with this data:

    bweight   gestwks            hyp sex    

1    2974 38.5200004577637       0 female          
2    3270 NA                     0 male            
3    2620 38.150001525878899     0 female          
4    3751 39.799999237060497     0 male            
5    3200 38.889999389648402     1 male           
6    3673 40.970001220703097     0 female          

In order to consider the string type arguments "male" and "female", I convert them to integers 1 and 0, like this :

male = 1*(sex == "male")

So, creating the linear model, where babyweight is the outcome variable:

lm2 = lm(bweight ~ gestwks + hyp + male)

But then when I'd like to see the parameters of the model, I get this(not the whole output is included here):

Call:
lm(formula = bweight ~ gestwks + 
    hyp + male)

Coefficients:
                              (Intercept)  gestwks26.950000762939499  
                                  864.000                                   -236.000  
gestwks27.329999923706101    gestwks27.9899997711182  
                                    7.363                                    146.469  
gestwks28.040000915527301   gestwks30.5200004577637  
                                  184.469                                    760.469  
gestwks30.649999618530298  gestwks30.709999084472699  
                                  900.000                                   -141.531

And I'm supposed to be getting only one pair of parameters. What am I doing wrong?

Slim Shady
  • 220
  • 3
  • 18
  • Probably converting `gestwks` to numeric. – jay.sf Mar 08 '20 at 15:22
  • @jay.sf what do you mean? – Slim Shady Mar 08 '20 at 15:24
  • 1
    I suggest this reading: https://stackoverflow.com/q/3418128/6574038 – jay.sf Mar 08 '20 at 15:33
  • You could check how you create or import the data. For some reason `gestwks` is stored as a `factor` and **not** as numeric! Thats why you get an estimate for *each* unique value `gestwks`. If you want to improve your question, here is some information on [asking a question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). The MRE will make it easier for others to find and test a answer to your question. That way you can help others to help you! – dario Mar 08 '20 at 15:48
  • The link jay.sf posted in the comment above should explain some of the problem. But rather than recasting the value as numeric it's probably safer to fix the import. For more help and support it would be great if you added a minimal reproducible example in an edit to your question – dario Mar 08 '20 at 15:50

2 Answers2

2

Before conducting any analysis, always explore your variables carefully. Pay attention to ranges and distributions for continuous variables and frequencies for categorical ones. Do this after importing the data.

In this case, the gestwks variable is not actually numeric. If you had looked at the output of str(my_data), where my_data is the name of your data frame, then you would have seen the potential problem with that variable. You probably need to revise the command to import the data. If it is correct, then you'll need to convert the variable into a numeric one using the appropriate command. Read the Warning in the help page of as.numeric.*

Data management is a key part of your analysis.

Look carefully at gestwks for strange looking values. table can help if there aren't too many records, or look at the first and last few sorted values.

* as.numeric (levels (f))[f] or as.numeric (as.character (f)) is the recommended command.

Edward
  • 10,360
  • 2
  • 11
  • 26
  • So these commands are supposed to convert my ``` gestwks ``` variable into a numeric vector. So If I check what the output of str(my_data) is, then gestwks is supposed to be of type "num"? Because now it's of type "chr"... – Slim Shady Mar 08 '20 at 16:23
  • Oh okay, never mind, i got what i just asked. Thanks for the answer! – Slim Shady Mar 08 '20 at 16:26
  • 1
    How did you get the data? Show your R command to import it. That may be the cause of the "chr". If you're happy with the importation, then yes, use one of those two commands to covert it to "numeric". – Edward Mar 08 '20 at 16:29
0

gestwks is a factor, you need to convert it with as.numeric before you regress on it.

fteufel
  • 59
  • 1
  • 5