grouping of dataframe in R

Question

I have the data.frame below. I want to add a column that classifies or group my data according to column 1 (inputrate) in that way that the first series of inputrate (1 to 2)470,470 is in group 1, the second series of (3) 450 is in group 2, the third series of (4) 470 is in group 3,the fourth series of (5 to 7)460,460,460 is in group4 and so on. Such, as indicated in the last column---

INPUTRATE   TEMP1     TEMP2
 470       355.4972 407.2139
 470       363.2138 414.4102
 450       370.8389 414.6563
 470       381.3884 413.6328
 460       386.9973 401.3242
 460       385.2969 388.0488
 460       390.3884 384.6963

OUTPUT

INPUTRATE   TEMP1    TEMP2  group
 470       355.4972 407.2139    1
 470       363.2138 414.4102    1
 450       370.8389 414.6563    2
 470       381.3884 413.6328    3
 460       386.9973 401.3242    4
 460       385.2969 388.0488    4
 460       390.3884 384.6963    4

Then based on no. of elements in particular group, I have to create a dataframe which have highest no. of same inputrate element.

akrun · Accepted Answer · 2015-12-28T10:03:51.500

1

We can try

library(data.table)
setDT(df1)[,group:= .GRP ,INPUTRATE]

or using match

df1$group <- with(df1, match(INPUTRATE, unique(INPUTRATE)))

Update

If it is to create a new group whenever there is a change in 'INPUTRATE',

setDT(df1)[, group := rleid(INPUTRATE)]

edited Dec 28 '15 at 10:03

answered Dec 28 '15 at 06:19

akrun

874,273
37
540
662

Ronak Shah · Answer 2 · 2015-12-28T10:12:32.993

0

You can try cumsum along with duplicated

data.frame(df, group = cumsum(!duplicated(df$INPUTRATE)))

#INPUTRATE   TEMP1    TEMP2  group
# 470      355.4972  407.2139    1
# 470      363.2138  414.4102    1
# 470      370.8389  414.6563    1
# 470      381.3884  413.6328    1
# 460      386.9973  401.3242    2
# 460      385.2969  388.0488    2
# 460      390.3884  384.6963    2

EDIT

As per the update if you want a new group on every change of INPUTRATE then you can use the rle function

r <- rle(df$INPUTRATE)
rep(seq_along(r$lengths), r$lengths)

edited Dec 28 '15 at 10:12

answered Dec 28 '15 at 06:31

Ronak Shah

377,200
20
156
213

@akrun,@Ronak...this works for me...my question is whenever there is any change in INPUTRATE on continuous reading of dataframe just make new group..without changing ordering of dataframe.. – andy Dec 28 '15 at 10:00

score 0 · Answer 3 · answered Dec 28 '15 at 06:59

0

Not sure I understand the question. Are you just trying to filter by the inputrate level with the most occurrences? If so here's another answer.

table(as.factor(df$inputrate))

then just subset based on the highest value of the table.

df[df$inputrate==highest_value,]

answered Dec 28 '15 at 06:59

lmkirvan

76
6

Start from the beginning I have to read the dataframe in the same order, and based on the INPUTRATE I want to make different groups. So whenever in the continuous run any change in INPUTRATE introduce new group with different number. For example: INPUTRATE TEMP1 TEMP2 group 470 355.4972 407.2139 1 470 363.2138 414.4102 1 470 370.8389 414.6563 1 470 381.3884 413.6328 2 460 386.9973 401.3242 2 450 385.2969 388.0488 3 460 390.3884 384.6963 4 460 390.3884 384.6963 4 – andy Dec 28 '15 at 09:21

grouping of dataframe in R

3 Answers3

Update