A factor in Multiple Regression model shows NA

Question

I want to make multiple regression analysis using three factors (Location, Trainer, Savings), but the factor Location shows NA. data:

Location <- c(rep("Kono",4),rep("Kailahun",4),rep("Bo",4),rep("Freetown",4))
profit <- c(100,800,900,550,4500,3000,2000,1000,10,350,150,300,800,500,1500,1250)
savings <- c(80,60,440,900,2000,5500,100,200,900,1500,2000,3000,5000,9000,400,1200)
Month <- c(rep("May",3),rep("June",4),rep("July",3),rep("August",3),rep("September",3))
data$Location <- dummy(data$Location)
data$Month <- dummy(data$Month)
data <- data.frame(Location,profit,savings,Month)

summary(lm(Profit~savings+Month+ocation, data=d))

LocationLocationBo                             NA         NA      NA       NA   
LocationLocationFreetown                       NA         NA      NA       NA   
LocationLocationKailahun                       NA         NA      NA       NA   
LocationLocationKono                           NA         NA      NA       NA

it is categorical variable and has 4 levels, and I am not sure if I am handling it correctly. Can someone clarify what is wrong with it?

Pretty hard to answer without any data: could you [share some of them to make your code reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? — s__, Feb 07 '22 at 16:15
hello, thank you for your feedback. I have added data sample on the post — hika, Feb 08 '22 at 06:18

cucumber95 · Answer 1 · 2022-02-08T14:07:47.773

If I understand correctly your predictors are all categorical, so if you want to include them in a linear regression model, you should first bild dummy variables (coded 0-1), because otherwise the regression will mistake your data for metric. Here is an article that explains it very well: https://www.statology.org/dummy-variables-in-r/

If the dependent variable (Profit) is categorical too a logistic regression model might be the better choice.

EDIT: After example data was added

Method 1: Basic without additional packages

This method is useful and easy in case you only have few dummy variables.

Location <- c(rep("Kono",4),rep("Kailahun",4),rep("Bo",4),rep("Freetown",4))
profit <- c(100,800,900,550,4500,3000,2000,1000,10,350,150,300,800,500,1500,1250)
savings <- c(80,60,440,900,2000,5500,100,200,900,1500,2000,3000,5000,9000,400,1200)
Month <- c(rep("May",3),rep("June",4),rep("July",3),rep("August",3),rep("September",3))
data <- data.frame(Location,profit,savings,Month)

data$Location.Bo <- 0
data$Location.Bo[data$Location == "Bo"] <- 1
data$Location.Freetown <- 0
data$Location.Freetown[data$Location == "Freetown"] <- 2
data$Location.Kailahun <- 0
data$Location.Kailahun[data$Location == "Kailahun"] <- 1
data$Location.Kono <- 0
data$Location.Kono[data$Location == "Kono"] <- 1

data$Month.May <- 0
data$Month.May[data$Month == "May"] <- 1
data$Month.June <- 0
data$Month.June[data$Month == "June"] <- 2
data$Month.July <- 0
data$Month.July[data$Month == "July"] <- 1
data$Month.August <- 0
data$Month.August[data$Month == "August"] <- 1
data$Month.September <- 0
data$Month.September[data$Month == "September"] <- 1

summary(lm(profit~savings+., data=data[,c(2,3,5:13)]))

Method 2: Using the package fastDummies

install.packages("fastDummies")
library(fastDummies)

Location <- c(rep("Kono",4),rep("Kailahun",4),rep("Bo",4),rep("Freetown",4))
profit <- c(100,800,900,550,4500,3000,2000,1000,10,350,150,300,800,500,1500,1250)
savings <- c(80,60,440,900,2000,5500,100,200,900,1500,2000,3000,5000,9000,400,1200)
Month <- c(rep("May",3),rep("June",4),rep("July",3),rep("August",3),rep("September",3))
data <- data.frame(Location,profit,savings,Month)

data_new <- dummy_cols(data, select_columns = c("Location", "Month"))

summary(lm(profit~savings+., data=data_new[,c(2,3,5:13)]))

Explanation: if you add the . function to the regression model, you need to specify the respective columns with the data. I did this by just specifying the column IDs but you can also write the column names, but that is usually more work.

Hello, yes the factor Location is categorical. I have made them dummy (LocationBo,Location Kono ...)but when I add the dummies to lm model it shows there is no object such as LocationBo or LocationKono. — hika, Feb 07 '22 at 16:32
As s__ already mentioned above it is very hard to figure out the solution without a reproducible example. Can you share a reduced dataset containing the relevant variables? — cucumber95, Feb 07 '22 at 17:08
hello, thank you very much for the response. I have added sample data on the post — hika, Feb 08 '22 at 06:18
I think the error was in the way the dummies were created. I will edit my solution and show you two methods how to solve it — cucumber95, Feb 08 '22 at 10:54
If my solution solved your problem, I'd be very happy if you accepted my answer :) — cucumber95, Feb 10 '22 at 12:49

A factor in Multiple Regression model shows NA

1 Answers1