1

I have a dataframe with columns TimeStamp, Type, Value in time series data. Type refers to whether it is a peak or valley. I want to:

Group all data by consecutive types For groups of "peak" type I want to select the highest For groups if "valley" type I want to select the lowest Filter the dataframe by these highest/lowest Expectation: I would have a dataframe that alternated each row between the highest peak and lowest valley.

The only way I know how to do this is by using a for loop and then adding consecutive values into a vector and then getting the max, then shoving this in a new dataframe and so on.

For those who know python, this is what I did in that (I need to transfer my code to R though):

segmentation['min_v'] = segmentation.groupby( segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(min)
segmentation['max_p'] = segmentation.groupby( segmentation.segmentation.pv_type.ne(segmentation.pv_type.shift()).cumsum() ).price.transform(max)

EDIT

Sample data set:

types <- c('peak', 'peak', 'valley', 'peak', 'valley', 'valley', 'valley')
values <- c(1.01,   1.00,    0.4,     1.2,     0.3,      0.1,      0.2)
segmentation <- data.frame(types, values)
segmentation

expectedTypes <- c('peak', 'valley', 'peak', 'valley')
expectedValues <- c(1.00, 0.4, 1.2, 0.1 )
expectedResult <- data.frame(expectedTypes, expectedValues)
expectedResult

I dont know a better way to generate the data.

Fred Johnson
  • 2,539
  • 3
  • 26
  • 52
  • 2
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code & libraries used, and a clear explanation of what you're trying to do and what hasn't worked. – camille Jul 03 '19 at 16:44
  • "The only way I know how to do this is by using a for loop" I am new to R, i am happy to use a library but I dont know any. I have added a sample data set. – Fred Johnson Jul 03 '19 at 18:06

1 Answers1

1

With R, an implementation using dplyr would be to take the cumulative sum of the logical comparison between the 'pv_type' and the lag of 'pv_type' as a grouping column and then get the min and max of 'price' as two new columns

library(dplyr)
segmentation %>%
       group_by(pv_type_group = cumsum(pv_type != lag(pv_type,
                 default = first(pv_type))) %>%
       mutate(min_v = min(price), max_p = max(price))

Update

With the OP's example, the expected output is summarised, so we use summarise instead of mutate. Also, used rleid (from data.table) instead of the logical cumulative sum

library(data.table)
segmentation %>% 
    group_by(grp = rleid(types)) %>% 
    summarise(types = first(types), expectedvalues = min(values)) %>%
    ungroup %>%
    select(-grp)
# A tibble: 4 x 2
#  types  expectedvalues
# <fct>           <dbl>
#1 peak              1  
#2 valley            0.4
#3 peak              1.2
#4 valley            0.1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • What does %>% do/mean? – Fred Johnson Jul 03 '19 at 16:19
  • @user2330270 It is chan operator which connects the lhs output to be used for futher processing – akrun Jul 03 '19 at 16:20
  • There are many people who don't have grasp in both languages at the same time. By downvoting, it is preventing people to respond to code conversion questions and there by reducing the value. It is a true that the OP didn't provide a reproducible example, but the code conversion doesn't really require that – akrun Jul 03 '19 at 17:15
  • ok thanks, having a test of your answer now – Fred Johnson Jul 03 '19 at 18:08
  • @user2330270 I updated the answer based on your example/expected – akrun Jul 03 '19 at 19:19
  • first one didnt work but second one does, thanks a lot. Time to try understand it :D – Fred Johnson Jul 03 '19 at 21:10
  • 1
    @user2330270 Second one is summarised output. In the python, you were `transform`ing and creating a new column. Also, the column names were ddifferent in example – akrun Jul 03 '19 at 21:10