I am new to dplyr and trying to improve my syntax. I have the following data frame:
testdf5<- data.frame(
stringsAsFactors = FALSE,
col1=c('aa', 'aa', 'aa', 'bb', 'bb', 'bb', 'cc','cc','cc'),
MyLength=c('500', '500', '600', '500', '600', '600', '700','700','600'),
col3=c('0.5', '0.5', '0.5', '0.5', '0.5', '0.5', '0.5','0.7','0.7'),
POS=c(
500, 1000, 2000,
400, 500, 600,
10000, 10500, 11000))
I want to:
1) group the rows by col1, Mylength and col3;
2) for each group, I want the minimum and the maximum POS
This is the result I want:
col1 MyLength col3 MinPos MaxPOS
aa 500 0.5 500 1000
aa 600 0.5 2000 2000
bb 500 0.5 400 400
bb 600 0.5 500 600
cc 600 0.7 11000 11000
cc 700 0.5 10000 10000
cc 700 0.7 10500 10500
This is my code, which works:
testdf6<- testdf5 %>%
#needs '.dots' to read a character vector
dplyr::group_by(.dots=c('col1', 'MyLength', 'col3')) %>%
dplyr::filter(POS==min(POS)) ##get min(POS)
colnames(testdf6)[4] <- 'MinPos'
testdf7<- testdf5 %>%
#needs '.dots' to read a character vector
dplyr::group_by(.dots=c('col1', 'MyLength', 'col3')) %>%
dplyr::filter(POS==max(POS)) ##Get max(POS)
#
colnames(testdf7)[4] <- 'MaxPos'
#Now merge
testdf8<- merge(testdf6, testdf7, by = c('col1', 'MyLength', 'col3'))
I am basically doing the same operation twice, and I was wondering if there is a cleaner way, as I am trying to improve my syntax. I look forward to your feedback.