0

I have windspeed readings in 3 directions (u, v and w) at two locations (reference mast6 and sonic_f) for specific wind angles (e.g. 0º, 5º etc) I've imported and rbinded all these into a single data.frame.

I want to plot the data in a pairwise form but the reference mast wasn't recording for the first X seconds. I need to trim the sonic files to the first timestamp that the equivalent reference mast data shows for each Angle and each u v or w direction.

As an example, the start time for each location is as follows:

> aggregate(data=df,StartTime~MeasurementLocation+Angle,min)
         Location Angle StartTime
1     mast6_u      00    17602
2     mast6_v      00    17602
3     mast6_w      00    18602
4   sonic_f_u      00        2
5   sonic_f_v      00        2
6   sonic_f_w      00        2
7     mast6_u      05    13001
8     mast6_v      05    13001
9     mast6_w      05    13002
10  sonic_f_u      05        2
11  sonic_f_v      05        2
12  sonic_f_w      05        2

So for example, the start time for all data belonging to angle 00 must be at 18602. EDIT

The desired output would be individual files for each angle, with length of sonic_f_u v or w for Angle==00 to be the same length as the shortest data set from mast_u v or w for the same angle. E.g. for Angle 00, mast6_u v and w must all start at 18602 and so should sonic_f_u v and w.

HCAI
  • 2,213
  • 8
  • 33
  • 65
  • The desired output would be that the length of mast6_u for Angle==00 would be the same length as sonic_f_u for Angle==00. So to do this i need to cut away the first 17601 values in sonic_f_u – HCAI Sep 21 '18 at 13:09
  • 2
    Please share your data using `dput()` so others can help. See more here [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Tung Sep 22 '18 at 20:59
  • @Tong I can't figure out how to make the data, conceptually it's weird but is a nicely rbinded data.frame of lots of two colums of data. Imagine it's like two colums of excel with different lengths for each angle stacked ontop of each other. I ended up doing the splitting by hand. – HCAI Sep 24 '18 at 10:35
  • 2
    Not clear about your expected output, how 18602 is the output for first 6 entries. What would be the output for next 6 entries? Do you want `max` value for each `Angle`? – Ronak Shah Sep 28 '18 at 02:49

3 Answers3

4

Hard to beat data.table on this...

library(data.table)
setDT(df)[ , start_time_max := max( StartTime ), by = Angle][StartTime >= start_time_max, ][, start_time_max := NULL][]

#    Num Location Angle StartTime
# 1:   3  mast6_w     0     18602
# 2:   9  mast6_w     5     13002

benchmarks

microbenchmark::microbenchmark(
  data.table = setDT(df)[ , start_time_max := max( StartTime ), by = Angle][StartTime >= start_time_max, ][, start_time_max := NULL][],
  tidyverse = { left_join(df,
                          df %>%
                            group_by(Angle) %>%
                            summarise(max(StartTime)),
                          by = "Angle"
  ) %>%
      filter(StartTime == `max(StartTime)`) %>%
      select(-`max(StartTime)`) %>%
      arrange(Angle, Location) %>%
      unique()

    }, times = 100
)

# Unit: microseconds
#       expr      min       lq     mean   median       uq       max neval
# data.table  937.233 1050.057 1195.317 1196.169 1289.260  1808.488   100
# tidyverse  4694.900 4991.645 5409.146 5172.855 5341.563 24359.309   100
Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • Thank you @Wimpel. I need to first calculate the minimum start time then do max(min(StartTime) to make sure all files start at the same time. The other thing is that the Times in the data.frame are not in order. e.g. I might have 1,2,3,4,4,5,5,6,100,7,101,8,102 etc. Can that be taken care of using data.tables? – HCAI Oct 03 '18 at 11:08
  • @HCAI data.table can take care of a lot. But I do not fully understand the question in your comment. My answer works on your sample data, and desired output. If you want to check if my answer works on your procuction-data: only one way to find out! – Wimpel Oct 03 '18 at 11:12
  • The problem is that I occasionally have repeated measurements for each location. So just because the values now start at the same time (because you've cut the rest away), I still have files with different lengths: aggregate(data=ds,Time~MeasurementLocation+Angle,NROW) MeasurementLocation Angle Time 1 mast6_u 0 15061 2 mast6_v 0 27061 3 mast6_w 0 15061 – HCAI Oct 03 '18 at 15:00
  • I think this will sort it out: arrange(Angle, Location) %>%distinct(Time,.keep_all = TRUE) – HCAI Oct 03 '18 at 15:02
2

I don't think I fully understand your use case, but this is how I've interpreted it: for each angle you want to find the maximum start time and filter the data frame for all wind directions at mast and sonic to have the same start time.

In tidyverse, you can do this by finding the maximum start time for each angle and joining this into the original data frame. Then use filter to remove non-matching start times:

left_join(df,
          df %>%
            group_by(Angle) %>%
            summarise(max(StartTime)),
          by = "Angle"
) %>%
  filter(StartTime == `max(StartTime)`) %>%
  select(-`max(StartTime)`) %>%
  arrange(Angle, Location) %>%
  unique()

# # A tibble: 2 x 4
#   Location Angle StartTime
#   <chr>    <int>     <int>
# 1 mast6_w      0     18602
# 2 mast6_w      5     13002

Use arrange() with the columns to sort on to remove duplicated rows and unique() to get rid of duplicates (you might need something more robust if the row columns aren't identical.

Paul
  • 2,877
  • 1
  • 12
  • 28
  • Hi @Paul thank you that looks fantastic. I've noticed that not all the readings are in order and some are duplicates due to measurement error. Can that be included in that great dplyr function? – HCAI Sep 28 '18 at 14:37
  • Yes, see with `arrange()` and `unique()`, see updated answer. – Paul Oct 01 '18 at 01:14
1

Suppose I have:

> dat
     Num  Location Angle StartTime
1    1   mast6_u     0     17602
2    2   mast6_v     0     17602
3    3   mast6_w     0     18602
4    4 sonic_f_u     0         2
5    5 sonic_f_v     0         2
6    6 sonic_f_w     0         2
7    7   mast6_u     5     13001
8    8   mast6_v     5     13001
9    9   mast6_w     5     13002
10  10 sonic_f_u     5         2
11  11 sonic_f_v     5         2
12  12 sonic_f_w     5         2

To convert all the 0 angles to the value at 0 for mast6_w you can do:

for (angle in c(0,5))
{
   dat[which(dat$Angle==0),4] <- dat[which(dat$Location=="mast6_w" & dat$Angle==0),4]
}

This sets the 4th column value for all 0 angles to the value at 0 and mast6_w. Then you end up with:

> dat
   Num  Location Angle StartTime
1    1   mast6_u     0     18602
2    2   mast6_v     0     18602
3    3   mast6_w     0     18602
4    4 sonic_f_u     0     18602
5    5 sonic_f_v     0     18602
6    6 sonic_f_w     0     18602
7    7   mast6_u     5     13001
8    8   mast6_v     5     13001
9    9   mast6_w     5     13002
10  10 sonic_f_u     5         2
11  11 sonic_f_v     5         2
12  12 sonic_f_w     5         2

For others, the structure here is:

structure(list(Num = 1:12, Location = c("mast6_u", "mast6_v", 
"mast6_w", "sonic_f_u", "sonic_f_v", "sonic_f_w", "mast6_u", 
"mast6_v", "mast6_w", "sonic_f_u", "sonic_f_v", "sonic_f_w"), 
    Angle = c(0L, 0L, 0L, 0L, 0L, 0L, 5L, 5L, 5L, 5L, 5L, 5L), 
    StartTime = c(17602L, 17602L, 18602L, 2L, 2L, 2L, 13001L, 
    13001L, 13002L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-12L))
mikorym
  • 33
  • 5
  • Thank you for your answer. I'm not finding that the for loop does anything to my data. Also mast6_w might not be the shortest dataset for each angle (sometimes there were glitches with mast6_u or mast6_v) – HCAI Sep 27 '18 at 09:59
  • Note that in my example I was hasty and the angles are not text: 0 vs. "00". The for loop in this example is unnecessary, the reason why I left this in is because for each angle you might want to do something different, since you said: "For example, the start time for all data belonging to angle 00 must be at 18602." – mikorym Sep 28 '18 at 07:10
  • The answer by @Paul seems to have guessed further at what you are looking for. – mikorym Sep 28 '18 at 07:13