3

I've got data somewhat like this (of course with many more rows):

Age     Work Zone     SomeNumber
26      1          2.61
32      4          8.42
41      2          9.71
45      2          4.14
64      3          6.04
56      1          5.28
37      4          7.93

I want to get the maximum SomeNumber for each zone at or below each age. SomeNumber increases with age, so I expect that the highest SomeNumber in Zone 2 by an under-32-y/o is by a guy who's age 31, but it could in fact be a guy age 27.

To do this I've written a nested for loop:

for(i in zonelist){
  temp = data[data$zone==i,]
  for(j in 1:max(data$age)){
    temp.lessequal=c(temp.lessequal,max((temp[temp$Age<=j,])$SomeNumber))
  }
  #plot temp.lessequal or save it at this point
}

which of course is tremendously slow. How can I do this faster? I've looked at the order function to sort by two columns at once, but that doesn't let me take the max of each group.

J.Doe
  • 749
  • 1
  • 5
  • 9
  • @Sathish This question doesn't seem duplicate to me if I have understood it properly. May be I'm wrong. – MKR Jan 28 '18 at 08:49
  • My answer is an exact duplicate of one of the solutions in the link posted above – Sathish Jan 28 '18 at 08:53

3 Answers3

1

Data:

df1 <- read.table(text='Age Work_Zone  SomeNumber
26      1          2.61
                   32      4          8.42
                   41      2          9.71
                   45      2          4.14
                   64      3          6.04
                   56      1          5.28
                   37      4          7.93', 
                   header = TRUE)

Code:

df2 <- with( df1, df1[ Age <= 32, ] )  # extract rows with Age <= 32
# get maximum of someNumber by aggregating with work_zone and then merging with df1 to combine the age column
merge(aggregate(SomeNumber ~ Work_Zone, data = df2, max), df2) 
#   Work_Zone SomeNumber Age
# 1         1       2.61  26
# 2         4       8.42  32
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • I don't think OP is hard-coding 32 anywhere but used it as example. He/she is looking for current row's *Age*. – Parfait Jan 28 '18 at 15:40
0

Using the library data.table you can select the rows which are less than required age, then output the max(somenumber) and their respective age for each Workzone ie group by workzone.

library(data.table)
setDT(df1)[Age<=32,.(max(SomeNumber),Age),by=Work_Zone]
   Work_Zone   V1 Age
1:         1 2.61  26
2:         4 8.42  32
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation would greatly improve its long-term value](//meta.stackexchange.com/q/114762/350567) by showing *why* this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – iBug Jan 28 '18 at 13:19
0

It seems OP is looking for max value based on <= condition on a particular column (Age).

The use of sqldf comes very handy in such cases in order to explain the logic. One solution could be:

# Data 
df <- read.table(text = "Age     Work_Zone     SomeNumber
26      1          2.61
32      4          8.42
41      2          9.71
45      2          4.14
64      3          6.04
56      1          5.28
37      4          7.93", header = T, stringsAsFactors = F)

library(sqldf)
df3 <- sqldf("select df1.Work_Zone, df1.Age, max(df2.SomeNumber) from df df1 
       inner join df df2 on df1.Work_Zone = df2.Work_Zone 
       WHERE df2.Age <= df1.Age 
       GROUP BY df1.Work_Zone, df1.Age")

# Result:
#   Work_Zone Age max(df2.SomeNumber)
# 1         1  26                2.61
# 2         1  56                5.28
# 3         2  41                9.71
# 4         2  45                9.71
# 5         3  64                6.04
# 6         4  32                8.42
# 7         4  37                8.42
MKR
  • 19,739
  • 4
  • 23
  • 33