-1

First of all, I would like to say sorry for your confusion to my description because of my poor English. I try to explain my question as my best, if you have anything that not understand please add a comment, I will explain with more details.

The data set used to draw plot like that (the image here is just a part of):
I have put a output of the dput at below.

Dataset(part)

That is a movement data captured by linear accelerator with the timestamp. I use ggplot2 to draw a line plot to show that in my report. There is my code:

......
#Convert timestamp format
time <- gsub(":", ".", x)
time <- strptime(time, format = "%H.%M.%OS")
time <- as.POSIXct(time)
df["time"] <- time

# Person B Plot
p <- ggplot(df, aes(x = time)) +
  scale_x_datetime(name = "Time", labels = date_format("%H:%M:%OS")) +
  ylab("PCA") +
  geom_hline(aes(yintercept = 0)) +
  scale_colour_manual("", values = c("PCA_A" = "hotpink3", "PCA_B" = "steelblue3", "Correlation" = "chocolate")) +
  geom_line(aes(y = PCA_b, group = 1, colour = "PCA_B"), size = 0) +
  # theme(text = element_text(size = 23), plot.title = element_text(hjust = 0.5)) +
  ggtitle("PCA_Two")

Because of the timestamp stored in the csv file as string. I have to change the format to POSIXct, then I can use scale_x_datetime to show the time on the x axis.Then I get a strange plot.

Strange Plot

There is a break between the two point. If I remove the first five lines and the "scale_x_datetime" in the code I showed. The plot will be fine, the curve will be smooth but the x axis can not show the time correctly.

Fine Plot but not perfect

Why and How?

---------- update 20/4/2020

I use the dput(df[20:50,]) to output a part of my dataset, I hope that will be helpful. Thanks the help from @chemdork123.

There is a simple description for the data structure below. The dataframe used to draw plot have four columns, time, PCA_a, PCA_b, cor. I will draw three line plot, all the three plot's X data is time (timestamp). In this post, I just show the "time - PCA_b" plot. In fact, all the three plot have the same issue, the break, and the break locations are same. (The NA in the "cor" col is not a bug, that's what I did on purpose.)

structure(list(time = structure(c(1587503540.556, 1587503540.577, 
1587503540.615, 1587503540.637, 1587503540.675, 1587503540.696, 
1587503540.716, 1587503540.756, 1587503540.776, 1587503540.817, 
1587503540.837, 1587503540.876, 1587503540.893, 1587503540.915, 
1587503540.937, 1587503540.976, 1587503540.997, 1587503541.018, 
1587503541.059, 1587503541.078, 1587503541.117, 1587503541.138, 
1587503541.18, 1587503541.201, 1587503541.24, 1587503541.26, 
1587503541.3, 1587503541.339, 1587503541.358, 1587503541.4, 1587503541.423
), class = c("POSIXct", "POSIXt"), tzone = ""), PCA_a = c(1.56737319252217, 
2.04606254627585, 2.49366222484302, 2.88101522283612, 3.18379411504211, 
3.38503090762478, 3.47436865063648, 3.44747654856326, 3.30707775976109, 
3.06371801441373, 2.73437161733756, 2.33935677190782, 1.89968708587307, 
1.43586301558354, 0.967277030171067, 0.511214148600076, 0.0816220889456876, 
-0.311381715806983, -0.661355048674678, -0.965683235694069, -1.22624198074107, 
-1.44997061419577, -1.64740413737597, -1.82782646420492, -1.99421995781177, 
-2.14199256386341, -2.26073408401317, -2.33585157388011, -2.34937651266747, 
-2.28185734041769, -2.11603996134387), PCA_b = c(0.428589019048672, 
0.437715207869297, 0.44415836273225, 0.447676595545035, 0.448336071890988, 
0.446396459498192, 0.442205853553038, 0.43616876635858, 0.42877854629294, 
0.420603253124693, 0.412148862183822, 0.403676755189904, 0.395124979959946, 
0.386241966203463, 0.376849622459395, 0.367015680942488, 0.35712348581213, 
0.347977244142877, 0.340825041944267, 0.337103574812562, 0.338073413214583, 
0.344591707232845, 0.35695103029739, 0.374713701538921, 0.396660690638421, 
0.420888192551911, 0.445042523797771, 0.466693774961235, 0.483678597255532, 
0.494312865435414, 0.497599592736315), cor = c(0.787242026266416, 
NA, NA, NA, NA, NA, NA, NA, NA, 0.297936210131332, NA, NA, NA, 
NA, NA, NA, NA, NA, -0.074108818011257, NA, NA, NA, NA, NA, NA, 
NA, NA, -0.437523452157598, NA, NA, NA)), row.names = 20:50, class = "data.frame")

---------- update 21/4/2020

I found a very interesting thing. If the size of the dataset smaller than 277, the plot will be perfect. Or the No.277 point will shift. I make a gist here with a 277 size dput. Anyone can test it? My plot will be enter image description here

  • 1
    What does the raw data look like between the circles you've indicated? – teunbrand Apr 19 '20 at 22:49
  • 2
    Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Apr 19 '20 at 22:56
  • @teunbrand the raw dataset is too big that I can not locate the break point. It is a movement data so is should be smooth. – Samuel Three Apr 20 '20 at 00:26
  • Hi @Tung , Thanks for your suggestions and the guides but the raw data is 500+ lines csv file, I can not share it in the post directly. – Samuel Three Apr 20 '20 at 00:35
  • Hi Samuel - you can post the portion of your dataset that is around the break in the data (since that is the particular interesting bit) by using `dput(df[X:Y, ])`, where "X" and "Y" represent the limits of the lines of the dataset you want to export. I'd think posting even 20 or 50 lines is possibly enough. If there are thousands in-between the important bits, you can use `dput(df[sample(X:Y, num), ])`, where "num" represents the number of lines you want to randomly sample between "X" and "Y". These are good ways to share portion of a large dataset so we can best help you. – chemdork123 Apr 20 '20 at 13:31
  • Hi @chemdork123 - Thanks a lot! I update the post, please tell me if you still find anything I explained not clearly. – Samuel Three Apr 21 '20 at 00:23
  • When I generate a simple plot from your data: `ggplot(df, aes(x=time, y=PCA_b)) + geom+line() + geom_point() + scale_x_datetime()`, everything is looking pretty much okay - no odd breaks. Try that plot (even without `scale_x_datetime()`) and see what you get. Perhaps some points are being mapped incorrectly to x or y due to the time conversion before you plotted, but the df posted seemed okay to me. – chemdork123 Apr 21 '20 at 13:54
  • Hi, @chemdork123 I found a very interesting thing. I found the break point position is No.277 line. If the size of the dataset smaller than 277 line, everything OK and perfect. If the size >= 277, the No.277 data point will shift on the plot. I have post a 277 lines dput in the github gist and put the link of the gist in the post, can you help to test it? – Samuel Three Apr 21 '20 at 18:20
  • Well, it looks like from the data you posted, there is simply a gap between the time of 276 and 277. When you compare the two charts, the one with the gap has the time assigned and the one without the gap/break is where you *did not* apply the time coercion. If you don't feel like there was an actual gap (looks like the gap is pretty minor here from a time perspective), then you can reconfirm your timestamp conversion "worked". If you took the measurements on the instrument yourself, you can confirm the times and dates are accurate. – chemdork123 Apr 21 '20 at 18:41
  • The simple answer here may be that the instrument/computer had a "blip" in measuring. Depending on the instrument and software, sometimes writing data has to pause for a datapoint or two while the buffer is cleared and written... every program works a bit differently so it's hard to say. Confirm the times are correct, and then if they are, you just have to accept that there is a break. Finally: you can check to confirm the break by plotting `geom_point` instead of `geom_line` just to show where the data points are. You'll see the gap easily there. – chemdork123 Apr 21 '20 at 18:44
  • @chemdork123 That is awesome! You find the correct answer. I go back to check the real timestamp in the raw csv file. Some data in 1 second disappeared. It should be a failure made by the instrument. Thanks a lot! – Samuel Three Apr 21 '20 at 22:29

1 Answers1

1

Thanks the help from @chemdork123 . I found all the data in one second around the break point disappeared in the raw dataset. It should be a failure made by the research instrument.

The answer is so easy that make me looks like a fool XD.

  • As long as you learn something, it was a worthwhile question to ask. From a fellow scientist here: the only bad question is one in which you failed to learn anything new. – chemdork123 Apr 22 '20 at 05:35