0

I'm working with a time series database that has no NA, what I mean by that is that after a row that should have missing data or an NA the column that tracks times jumps to the next value without recording the time stamp of the NA. for example:

time - value

1 - 50kg

2 - 60kg

4- 45kg

there is no recording of the NA, but it is implicit or perhaps a pattern that there's missing data, is ther a good package to handle this kind of missing data?I've tried using 'naniar', but it doesn't work if I don't have NAs

I'm looking for a package that identifies this and imputes the missing data

Thank you!

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Welcome to SO, Luis Barbosa! A good first question. A minor request: the Stack tag-recommendation system is imperfect, please verify the tags it recommends as relevant. In this case, the [tag:rstudio] tag says *"DO NOT use this tag for general R programming problems"*, not to be used unless the question is explicitly about the IDE; and the [tag:cran] tag is meant for questions about the *"central repository for R distributions"*, not about general R questions. I've removed both. Thanks! – r2evans Feb 01 '22 at 18:23
  • Please read the info at the top of the [tag:r] tag page and note that it asks to provide reproducible input using `dput` so that anyone else can copy and paste it from the question to their session. – G. Grothendieck Feb 01 '22 at 19:14

2 Answers2

1

One way to deal with this is to create a frame of "all times" (even the missing ones) and then merge it back in.

dat <- data.frame(time = c(1L, 2L, 4L), value = c(50, 60, 45))
dat
#   time value
# 1    1    50
# 2    2    60
# 3    4    45
times <- data.frame(time = seq(min(dat$time), max(dat$time), by = 1))
times
#   time
# 1    1
# 2    2
# 3    3
# 4    4
merge(dat, times, by = "time", all = TRUE)
#   time value
# 1    1    50
# 2    2    60
# 3    3    NA
# 4    4    45

I included a more-verbose call to seq with by= in case your real data has a slightly-different structure. For instance, if those are POSIXt, then you may want to change that to be by="1 day" or by="1 hour". Either way, you control the gaps there.

For more information about merges, see How to join (merge) data frames (inner, outer, left, right) and What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?.

(This gets a little more complicated if the interval between rows is inconsistent, or if the real time variables are not aligned perfectly on an integer-like component.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
0

It isn't clear what you have -- a text file with literally the text shown? a data frame? other? so we will assume that we have Lines copied verbatim from the question in the Note at the end except we added one more row.

Read it into a zoo series using read.zoo as such objects can represent irregularly spaced series. (read.zoo can also read files and data frames.) Next, convert that to a ts series which can only represent regularly spaced series and so the conversion causes the empty spots to be filled with NA's. Then use na.locf (last occurrence carried forward), na.approx (linear interpolation) or na.spline (spline interpolation) to fill in the NA's. Leave it as a ts series or convert it back to a zoo series using as.zoo or to a data frame using fortify.zoo.

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, sep = "-", strip.white = TRUE,
  comment.char = "k")

tt <- as.ts(z)
na.approx(tt)
## Time Series:
## Start = 1 
## End = 5 
## Frequency = 1 
## [1] 50.0 60.0 52.5 45.0 46.0

Note

Lines <- "time - value
1 - 50kg
2 - 60kg
4- 45kg
5 - 46kg"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341