I have a large dataframe (86000 rows) consisting of several patients, each of which had several occasions of blood tests (just 3 tests: T1, T2 and T3) during their stay. Some of these patients were hospitalised for 3 days, some for 168 days.
This is just a fraction of the output from count function, which shows the big variation of the days spent in the hospital:
No Id Days
148 29757 111
149 30368 36
150 31062 29
151 31993 24
152 32198 51
153 32438 6
154 32836 74
155 32944 24
156 33467 39
157 36108 90
158 36849 6
159 37136 3
I used aggregate to calculate the means etc, but I would like to see a summary of who did improve or deteriorate during their stay.
I think that this would involve to extract at least the first and last tests, and take the difference (the lower, the better). But I couldn't find a way to do that.
I thought that an easier solution would be to transform the whole results to ordered data (according to the normal ranges of the tests) and see how many of them had abnormally low or high values. Unfortunately almost every patient has lows and highs.
Ideally, I would like to see the progress of several patients (or groups of paients) over time. But since they were hospitalised in different time-frames, the (over-simplified) result is like that:
As you can see, the first patient (red dots) started with mediocre values, quickly worsened (high values), and then improved (lower values). The progress of the second patient is not clear, since his/her stay was probably short.
Could someone suggest a starter (code or idea)? I checked some questions about multiple time-series plots with unequal observations, but they are not helpful in my case. An example (anonymised) dataset is here:
structure(list(Id = c("10200", "10200", "10200", "10200", "10200",
"10200", "10700", "10700", "10700", "10700", "10700", "10700",
"10700", "10700", "10700", "10700", "10700", "10700", "10700",
"10700", "10700", "10766", "10766", "10766", "10766", "10766",
"10766", "10766", "10766", "10766", "10766", "10766", "10766",
"10766", "10766", "10766", "10766", "10766", "10766", "10766"
), Date = structure(c(15068, 15068, 15068, 15069, 15069, 15069,
15072, 15072, 15072, 15072, 15072, 15072, 15073, 15073, 15073,
15075, 15075, 15075, 15078, 15078, 15078, 15073, 15074, 15074,
15075, 15075, 15075, 15075, 15076, 15076, 15076, 15078, 15078,
15078, 15081, 15082, 15083, 15084, 15085, 15085), class = "Date"),
Test = c("T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1",
"T2", "T2", "T3", "T3", "T1", "T2", "T3", "T1", "T2", "T3",
"T1", "T2", "T3", "T1", "T1", "T2", "T1", "T1", "T2", "T2",
"T1", "T2", "T3", "T1", "T2", "T3", "T1", "T1", "T2", "T1",
"T1", "T2"), Result = c(131, 4.53, 5.4, 108, 3.19, 3.7, 125,
NA, 1.26, NA, NA, 3.8, 125, 0.97, 4.2, 73, 0.84, 6.6, 48,
0.52, 4.8, 60, 75, 0.83, 52, 51, 0.62, 0.65, 40, 0.57, 4.1,
45, 0.54, 3.7, 96, 77, 1.04, 134, 144, 0.95)), .Names = c("Id",
"Date", "Test", "Result"), row.names = c(3L, 6L, 4L, 2L, 1L,
5L, 10L, 14L, 9L, 19L, 8L, 11L, 20L, 18L, 7L, 17L, 13L, 21L,
12L, 15L, 16L, 22L, 28L, 29L, 24L, 31L, 26L, 33L, 34L, 32L, 37L,
23L, 35L, 25L, 38L, 36L, 30L, 27L, 39L, 40L), class = "data.frame")