2

I face the following barrier.

My csv data look like this:

csv data

I want to use propensity score matching and compare different methods to see which is the optimum one for my data. However, I seem to get an error in the data and I can not figure out why:

Error: Missing and non-finite values are not allowed in the covariates. Covariates with missingness or non-finite values: pat_gender, pat_race, pat_ethnicity

I checked and don't have missing values. I don't understand what it means with "non-finite". I tried to change characters with numbers in the pat_gender, e.g. Male to 1, Female to 0, but I still get the same error. I attach my file to hopefully help.

library(MatchIt)
library(dplyr)
library(optmatch)
 
mydata<- read.csv("C:/Users/Desktop/prp_for_psm_pq.csv")

set.seed(1234)

match.itzs <- matchit(cohort_flag ~ pat_age + pat_gender + pt_hist_in_months + pt_visit_count + pat_race + pat_ethnicity, data = mydata, ratio=1)

df.matchzs <- match.data(match.itzs)[1:ncol(cohort_initial)]

prp_cohort_psm_zs_test <- df.matchzs
Leonardo
  • 119
  • 10
  • PSM (Propensity Score Matching) computes a PS (Propensity Score) to each patient, i.e. probability of being part of reference group when looking at explicative variables. To compute such a score (through GLM, Generalized Linear Model), you need values filled for each explicative variable. It seems that you have NA values in your table. – Yacine Hajji Nov 21 '22 at 09:53
  • Ps : why do you specify `[1:ncol(cohort_initial)]` when you want to extract your matched dataframe. – Yacine Hajji Nov 21 '22 at 09:55
  • Hello, please check [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to know how to make a reproducible example. In particular, you should avoid screenshots of your data and use `dput()` on your data instead. You can also take a look at the [reprex package](https://reprex.tidyverse.org/) that is a good way to check that your example is reproducible. Doing this is a bit of work but it will be much easier to help you with this. Also, it is often a great way to spot errors by yourself – bretauv Nov 21 '22 at 10:03
  • Hi @YacineHajji, thanks. I checked in my csv and don't have missing or NA values. – Leonardo Nov 21 '22 at 10:31
  • I'm fairly certain this is an issue with reading your CSV into R. Try running `which(!is.finite(mydata$pat_gender))`. This will tell you which observations are missing. It might be something like you have what looks like an empty row in your CSV but it actually contains invisible text (like a space). – Noah Nov 22 '22 at 18:49
  • 2
    It turns out this is a bug in version 4.5.0 of `MatchIt`. See [here](https://github.com/kosukeimai/MatchIt/issues/138). The problem was with character variables. Making them factors solves the problem. – Noah Dec 01 '22 at 16:50

1 Answers1

1

I got the code to run by setting the variables to factors, so something like this:

mydata <- mydata %>% mutate(pat_gender = factor(pat_gender))
cgvoller
  • 873
  • 1
  • 14
Jacqueline
  • 11
  • 1