1

I am using the Synth package to demonstrate the divergence in development between Djibouti and a synthetic model of Djibouti if it didn't have international intervention.

Despite several similar questions and attempts at the offered answers, I have still be struggling with the error:

unit.variable not found as numeric variable in foo

I have tried several different dataprep() strategies and still cannot run the code.

ddSMI <- as.data.frame(ddSMI) %>%   
   mutate(LifeYrs = as.numeric(LifeYrs),
          PedYrs = as.numeric(PedYrs),
          Health.Index.Total = as.numeric(Health.Index.Total),
          Income.Index.Total = as.numeric(Income.Index.Total),
          SchoolMean = as.numeric(SchoolMean),
          Cno = as.numeric(Cno))

I am trying to produce a synthetic control model and have been using different iterations of this code. Though I have changed the class to numeric successfully, I still get the same error. Here is the head of my data for reprex

head(ddSMI)
# A tibble: 6 x 8
   Year   Cno Country PedYrs LifeYrs          
  <dbl> <dbl> <chr>   <chr>  <chr>            
1  2000     1 Algeria 6.31   69.5999999999999…
2  2001     1 Algeria 6.23   69.2             
3  2002     1 Algeria 6.28   69.5             
4  2003     1 Algeria 6.32   71.0999999999999…
5  2004     1 Algeria 6.36   71.4000000000000…
6  2005     1 Algeria 6.39   71.7             
# … with 3 more variables: SchoolMean <chr>,
#   Health Index Total <chr>,
#   Income Index Total <chr>

Please see the code below.

dataprep.out <- dataprep(foo = ddSMI,
                         predictors = c("LifeYrs", "PedYrs", "Health.Index.Total", "Income.Index.Total", "SchoolMean"),
                         predictors.op = "mean", # the operator
                         time.predictors.prior = 2007:2008, #the entire time frame from the #beginning to the end
                         special.predictors = list(
                           list("HDI Rank", 2000:2020, "mean"),
                           list("LifeYrs", seq(2007,2008,2), "mean"),
                           list("PedYrs", seq(2007,2008,2), "mean"),
                           list("Health Index Total", seq(2007, 2008, 2), "mean"),
                           list("Income Index Total", seq(2007,2008, 2), "mean"),
                           list("School Mean", seq(2007, 2008, 2), "mean")),
                         dependent = "HDI Rank", #dv
                         unit.variable = "Cno", #identifying unit numbers
                         unit.names.variable = "Country", #identifying unit names
                         time.variable = "Year", #time period
                         treatment.identifier = 5,#the treated case
                         controls.identifier = c(2:4, 6:15),#the control cases; all others #except number 5
                         time.optimize.ssr = 2007:2008,#the time-period over which to optimize
                         time.plot = 2000:2020)#the entire time period before/after the treatment

Here is a helpful resource on the Synth package which I used to help guide/ troubleshoot: "Synth: An R Package for Synthetic Control Methodsin Comparative Case Studies"

My data is in the same format and yet...can't get it to run! It would be immensely appreciated if anyone can crack this!

Alix Ziff
  • 11
  • 2
  • 2
    I don't know the `synth` package but your error message says something about `unit.variable not found as numeric...` and the column `unit.variable` defined in your data.frame is a character var. Perhaps those things are linked? – Martin Gal May 25 '21 at 23:20
  • Thank you @MartinGal! I prior to the posted code I actually cleaned the data to ensure that unit.variable is numeric `ddSMI <- as.data.frame(ddSMI) %>% mutate(LifeYrs = as.numeric(LifeYrs), PedYrs = as.numeric(PedYrs), Health.Index.Total = as.numeric(Health.Index.Total), Income.Index.Total = as.numeric(Income.Index.Total), SchoolMean = as.numeric(SchoolMean), Cno = as.numeric(Cno))` when I check it says `> class(ddSMI$Cno) [1] "numeric"` ...so this is where I am stuck – Alix Ziff May 26 '21 at 13:09
  • Does `Cno` contain any `NA`'s? – Martin Gal May 26 '21 at 13:19
  • thank you @MartinGal for continuing to help me with this! No, the `Cno` variable is just an identifier for each country so each observation for Ethiopia has a 1 in the `Cno` column, each observation of Zambia has a 12 in the `Cno` column, etc. – Alix Ziff May 26 '21 at 13:22
  • Since I'm not able to help you further, perhaps you can add some sample data to create a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Martin Gal May 26 '21 at 13:35
  • thanks again @MartinGal for trying--I included some reprex, so hopefully that will help! – Alix Ziff May 28 '21 at 13:31

1 Answers1

5

I had a similar error, although it had nothing to do with the unit variable being numeric (it is basically the first error message in the code: see here).

Make sure your object is a dataframe, and only a dataframe. I would recommend checking the data structure with the "synth.data" example that is provided with the package. Given your code is suggesting your object is also a tibble (tbl_df), this might be the reason for the error.

is.data.frame(synth.data)
[1] TRUE
class(synth.data)
[1] "data.frame"
is.data.frame(DATA)
[1] TRUE
class(DATA)
[1] "tbl_df"     "tbl"        "data.frame"
DATA <- as.data.frame(DATA)
class(DATA)
[1] "data.frame"
Nick Bombaij
  • 51
  • 1
  • 4