I have a dataset that looks like this (mock example):
SW_I fault X locomotive A 10 faults 100 days
SW_I fault X locomotive B 20 faults 200 days
SW_I fault X locomotive C 30 faults 300 days
SW_I fault Y locomotive D 90 faults 100 days
SW_I fault Y locomotive E 10 faults 100 days
I need the “censored” data to be imputed to look like this:
SW_I fault X locomotive A 10 faults 100 days
SW_I fault X locomotive B 20 faults 200 days
SW_I fault X locomotive C 30 faults 300 days
SW_I fault X locomotive D 0 faults 100 days
SW_I fault X locomotive E 0 faults 100 days
SW_I fault Y locomotive A 0 faults 100 days
SW_I fault Y locomotive B 0 faults 200 days
SW_I fault Y locomotive C 0 faults 300 days
SW_I fault Y locomotive D 90 faults 100 days
SW_I fault Y locomotive E 10 faults 100 days
What is the best way to do this with data.tables (the dataset I have is large)? I can a list of unique locomotives for each SW_n, then, subset by fault, and concatenate new lines with all the locomotives not included in the resulting subsetted table, with number of faults = 0, but number of days being the same.
I wonder, however, if there is a cleverer way to do this, with some sort of a merge of two copies of the same table, one with the actual number of faults, and another with zeroes.
P.S. I am not trying to impute missing data. I am trying to definitively show that the censored data are zeroes.