0

I have multiple time series datasets to analyze. In order to analyze I have to do preprocessing. One troublesome task is to identify irregualar missing rows and insert rows of NAs.

So far, I did this with:

new <- rep(NA, length(dati))
dati <- InsertRow(dati, NewRow=new, RowNum = 1)

Has anyone an idea how I could neatly insert those missing values without visually identifying those rows? I made some research, but only found solutions for irregualr missing rows for continous time date (e.g. 00:01 00:02 00:03...). The problem is, that my datestamps (6 per day) are rather irregular regarding hour and minute

    1. Signal: around 9am,
    1. Signal: around 11:30am,
    1. Signal: around 2pm,
    1. Signal: around 4:30pm,
    1. Signal: around 7pm,
    1. Signal: around 9:30pm

The only fixed term is, that per day, there have to be 6 signals / rows in the described order, but some of them are missing. I added two pictures, demonstrating what data I have and what I need to get as a result.

So here is the data: Original Data: structure(c(1559399495, 1559410907, 1559417625, 1559459908, 1559469538, 1559478830, 1559486650, 1559495990, 1559504661, 1559554718, 1559563310, 1559572971, 1559583383, 1559590366, 1559632394, 1559640716, 1559649675, 1559658794, 1559668671, 1559676814, 1559720350, 1559736095, 1559745212, 1559756054, 1559763779, 1559804530, 1559813489, 1559823971, 1559832774, 1559840465), class = c("POSIXct", "POSIXt"), tzone = "")

Desired Data: structure(c(NA, NA, NA, 1559399495, 1559410907, 1559417625, 1559459908, 1559469538, 1559478830, 1559486650, 1559495990, 1559504661, NA, 1559554718, 1559563310, 1559572971, 1559583383, 1559590366, 1559632394, 1559640716, 1559649675, 1559658794, 1559668671, 1559676814, 1559720350, NA, 1559736095, 1559745212, 1559756054, 1559763779), class = c("POSIXct", "POSIXt"), tzone = "")

Original data:

original data

Desired data:

desired data

Community
  • 1
  • 1
  • 1
    Please provide some sample data using `dput()` and your expected output. – tmfmnk Dec 07 '19 at 15:07
  • I am really new here, how does the dput() function work? – Björn Butter Dec 07 '19 at 15:56
  • Please look at https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – tmfmnk Dec 07 '19 at 16:22
  • May be easier to just manually write a small sample DF which is similar to yours. For example, here is a DF with 3 columns: day, time, value. Pretend that each day needs to have 3 rows. Day #2 only has 2 rows. `data.frame(day = c(1,1,1,2,2,3,3,3), time = c('10:30', '13:00', '5:15', '9:00', '11:30', '2:00', '17:00', '23:15'), value = c(5,3,7,5,1,4,3,9))` – sam Dec 07 '19 at 17:42
  • Run this in R `dput(head(dati, 20))` and paste the output of that into your question. Also please read the instructions at the top of the [tag:r] tag page. – G. Grothendieck Dec 07 '19 at 18:00
  • Would a reasonable algorithm be: 1) if a time stamp is within in h hour of 9 am then adopt this as the 9 am value, otherwise set the 9 am value to missing. 2) if a if a time stamp is within in h hour of 11:30 am then adopt this as the 11:30 am value, otherwise set the 11:30 am value to missing.... Could set h to 1 hour initially and see if that works. Also need to check that all time stamps on a day were assigned. If not, check and revise the algorithm – Tony Ladson Dec 08 '19 at 05:40

0 Answers0