0

I want to make multiple regression analysis using three factors (Location, Trainer, Savings), but the factor Location shows NA. data:

Location <- c(rep("Kono",4),rep("Kailahun",4),rep("Bo",4),rep("Freetown",4))
profit <- c(100,800,900,550,4500,3000,2000,1000,10,350,150,300,800,500,1500,1250)
savings <- c(80,60,440,900,2000,5500,100,200,900,1500,2000,3000,5000,9000,400,1200)
Month <- c(rep("May",3),rep("June",4),rep("July",3),rep("August",3),rep("September",3))
data$Location <- dummy(data$Location)
data$Month <- dummy(data$Month)
data <- data.frame(Location,profit,savings,Month)

summary(lm(Profit~savings+Month+ocation, data=d))

LocationLocationBo                             NA         NA      NA       NA   
LocationLocationFreetown                       NA         NA      NA       NA   
LocationLocationKailahun                       NA         NA      NA       NA   
LocationLocationKono                           NA         NA      NA       NA   

it is categorical variable and has 4 levels, and I am not sure if I am handling it correctly. Can someone clarify what is wrong with it?

hika
  • 1
  • 1
  • 1
    Pretty hard to answer without any data: could you [share some of them to make your code reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – s__ Feb 07 '22 at 16:15
  • hello, thank you for your feedback. I have added data sample on the post – hika Feb 08 '22 at 06:18

1 Answers1

0

If I understand correctly your predictors are all categorical, so if you want to include them in a linear regression model, you should first bild dummy variables (coded 0-1), because otherwise the regression will mistake your data for metric. Here is an article that explains it very well: https://www.statology.org/dummy-variables-in-r/

If the dependent variable (Profit) is categorical too a logistic regression model might be the better choice.

EDIT: After example data was added

Method 1: Basic without additional packages

This method is useful and easy in case you only have few dummy variables.

Location <- c(rep("Kono",4),rep("Kailahun",4),rep("Bo",4),rep("Freetown",4))
profit <- c(100,800,900,550,4500,3000,2000,1000,10,350,150,300,800,500,1500,1250)
savings <- c(80,60,440,900,2000,5500,100,200,900,1500,2000,3000,5000,9000,400,1200)
Month <- c(rep("May",3),rep("June",4),rep("July",3),rep("August",3),rep("September",3))
data <- data.frame(Location,profit,savings,Month)

data$Location.Bo <- 0
data$Location.Bo[data$Location == "Bo"] <- 1
data$Location.Freetown <- 0
data$Location.Freetown[data$Location == "Freetown"] <- 2
data$Location.Kailahun <- 0
data$Location.Kailahun[data$Location == "Kailahun"] <- 1
data$Location.Kono <- 0
data$Location.Kono[data$Location == "Kono"] <- 1

data$Month.May <- 0
data$Month.May[data$Month == "May"] <- 1
data$Month.June <- 0
data$Month.June[data$Month == "June"] <- 2
data$Month.July <- 0
data$Month.July[data$Month == "July"] <- 1
data$Month.August <- 0
data$Month.August[data$Month == "August"] <- 1
data$Month.September <- 0
data$Month.September[data$Month == "September"] <- 1

summary(lm(profit~savings+., data=data[,c(2,3,5:13)]))

Method 2: Using the package fastDummies

install.packages("fastDummies")
library(fastDummies)

Location <- c(rep("Kono",4),rep("Kailahun",4),rep("Bo",4),rep("Freetown",4))
profit <- c(100,800,900,550,4500,3000,2000,1000,10,350,150,300,800,500,1500,1250)
savings <- c(80,60,440,900,2000,5500,100,200,900,1500,2000,3000,5000,9000,400,1200)
Month <- c(rep("May",3),rep("June",4),rep("July",3),rep("August",3),rep("September",3))
data <- data.frame(Location,profit,savings,Month)

data_new <- dummy_cols(data, select_columns = c("Location", "Month"))

summary(lm(profit~savings+., data=data_new[,c(2,3,5:13)]))

Explanation: if you add the . function to the regression model, you need to specify the respective columns with the data. I did this by just specifying the column IDs but you can also write the column names, but that is usually more work.

cucumber95
  • 76
  • 7
  • Hello, yes the factor Location is categorical. I have made them dummy (LocationBo,Location Kono ...)but when I add the dummies to lm model it shows there is no object such as LocationBo or LocationKono. – hika Feb 07 '22 at 16:32
  • As s__ already mentioned above it is very hard to figure out the solution without a reproducible example. Can you share a reduced dataset containing the relevant variables? – cucumber95 Feb 07 '22 at 17:08
  • hello, thank you very much for the response. I have added sample data on the post – hika Feb 08 '22 at 06:18
  • I think the error was in the way the dummies were created. I will edit my solution and show you two methods how to solve it – cucumber95 Feb 08 '22 at 10:54
  • If my solution solved your problem, I'd be very happy if you accepted my answer :) – cucumber95 Feb 10 '22 at 12:49