0

I am trying to use depmixS4 package to get HMM model. But i got error message: Error in lm.wfit(x = as.matrix(object@x[!nas, ]), y = as.matrix(object@y[!nas, : missing or negative weights not allowed

    ```
    mod2 <- depmix(response = norm_attributed_orders ~ 1, data = dfout2, nstates = 2,            
    family = gaussian())
    mod2 <- fit(mod2)
    summary(mod2)
    ```

I tested there is no NA value in the dataset "dfout2"

$ norm_attributed_orders: num [1:26307776] 0 0 0 0 0 0 0 0 0 0 ... (because I normalized the data into range [0, 1])

The "norm_attributed_orders" data like: 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.01923077 0.00000000 0.11538462 0.11538462 0.07692308 0.21153846 0.11538462 0.23076923 0.23076923 0.03846154 0.15384615 0.11538462 0.28846154 0.17307692

Han Rinne
  • 31
  • 4
  • I can't see that there's no NAs from the code above, can you do `colSums(is.na(dfout2))` . it should be all zeros if there's no NAs – StupidWolf Jul 12 '21 at 07:24
  • I tried any(is.na(dfout2)). Return FALSE. There are not all zeros, but majority of some series are zeros. – Han Rinne Jul 12 '21 at 17:49
  • I am pretty sure it's not the zeros.. I ran a simulated dataset with 99% zeros on the dependent and it still works. can you paste the output of `dput(head(norm_attributed_orders,20))` – StupidWolf Jul 12 '21 at 18:06
  • see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example , help us help yourself.. otherwise there's no way to find out whats wrong – StupidWolf Jul 12 '21 at 18:06
  • added sample data in the question. – Han Rinne Jul 12 '21 at 18:30
  • Actually, the input data is a panel data with 411059 response time series data (each length = 53). there are 143058 (35%) has more than 90% zeros. – Han Rinne Jul 12 '21 at 18:42
  • is it very hard to do a `dput()` ? the problem lies in the structure of your data. from the limited example above, i only see numbers and if you try your code on any dataset, it works – StupidWolf Jul 12 '21 at 19:04
  • what is `class(dfout2$norm_attributed_orders)` – StupidWolf Jul 12 '21 at 19:05
  • class(dfout2$norm_attributed_orders) is 'numeric'. here is all other columns type: $ week_index : num [1:26307776] 1 2 3 4 5 6 7 8 9 10 ... $ advertiser_id : num [1:26307776] 8845 8845 8845 8845 8845 $ attributed_orders : num [1:26307776] 0 0 0 0 0 0 0 0 0 0 ... $ norm_attributed_orders : num [1:26307776] 0 0 0 0 0 0 0 0 0 0 ... $ has_order : num [1:26307776] 0 0 0 0 0 0 0 0 0 0 ... – Han Rinne Jul 12 '21 at 19:21
  • do you have Inf values? do `any(is.infinite(x))` – StupidWolf Jul 12 '21 at 19:44
  • there is no Inf values, because i did normalization using max_min range. I modified the code to mod3 <- depmix(norm_attributed_orders ~ 1, data = dfout, nstates = 3, ntimes = ntimevector, family = gaussian()) specifying the length of each response series. But still have error: Error in glm.fit(x = object@x, y = object@y, weights = w, family = object@family, : NAs in V(mu) – Han Rinne Jul 12 '21 at 20:03

1 Answers1

0

The problem is most likely that, because of the large number of zeroes, one of the estimated states will start fitting these perfectly (meaning its state-conditional distribution will become a Normal with a mean of 0 and a variance approaching 0). This will lead to estimation issues and the reported error. Effectively, the model is not well-specified for this data (such a large number of exact 0's should not be expected in a Normal distribution).

One approach would be to set a lower bound on the standard deviation of the normal (to e.g. .0001, or something like that). See ?fit for setting bounds on parameters of depmix models.

Another approach could be to model the zeroes with a separate process (as in e.g. a zero-inflated Poisson model. You could define a binary indicator variable with value 1 when norm_attributed_orders = 0, i.e.

dfout2$zero <- as.numeric(dfout2$norm_attributed_orders == 0)

You could then replace the zeroes in norm_attributed_orders by NA, and try running a model such as

mod2 <- depmix(response = list(norm_attributed_orders ~ 1, zero ~ 1), 
    data = dfout2, nstates = 2,            
    family = list(gaussian(), binomial())
mod2 <- fit(mod2)
summary(mod2)

This is a bit of a hack, and will change the likelihood targetted by the model. But it might give reasonable results. If there are also a large number of exact 1's in the data, you could define another indicator variable for those.

mspeek
  • 176
  • 5