1

I have an interesting and what I think should be a simple problem. The problem is how to assign a categorical variable based upon the numerical or date order in another column.

The data is sample point data over time. The same points have been measured multiple times over the course of a number of years. I want to assign the values T1, T2, T3 etc for each sample point, with T1 the first measurement, T2 the second and so on for each point.

If the data is for example:

df <- data.frame(Point = factor(c("A", "A", "B", "B", "C", "D", "E", "E", "E")), 
                            Date = c("20140404", "20161002", "20150217", "20170101", "20130508",
                                     "20130514", "20131024", "20150412", "20170210"),
                            Data = c(10, 5, 5, 3, 2, 7, 8, 5, 6))

The data frame would look like:

   Point     Date Data
1      A 20140404   10
2      A 20161002    5
3      B 20150217    5
4      B 20170101    3
5      C 20130508    2
6      D 20130514    7
7      E 20131024    8
8      E 20150412    5
9      E 20170210    6

And the end result would be:

  Point     Date Data  Time
1      A 20140404   10  T1
2      A 20161002    5  T2
3      B 20150217    5  T1
4      B 20170101    3  T2
5      C 20130508    2  T1
6      D 20130514    7  T1
7      E 20131024    8  T2
8      E 20150412    5  T3
9      E 20170210    6  T1

I'm sure this can be accomplished using a for loop, where:

for (i in df$Point {
df$Time <- 
}

But I get stuck at how to get R to add T1 for the lowest df$Date, T2 for the next and so on.

Any help appreciated.

Alison Bennett
  • 285
  • 1
  • 8
  • 20
  • So you just want a group counter like `ave(df$Data, df$Point, FUN=seq_along)` ? – thelatemail Feb 14 '17 at 03:43
  • Essentially duplicates http://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame if data is sorted appropriately. – thelatemail Feb 14 '17 at 03:47
  • Thanks @thelatemail, yes that is basically it but I was hoping for a solution that didn't require presorting the data. Which is why I was thinking of a loop. ie. loop through the data finding the lowest value and assign it T1, then loop through again finding the next lowest value and so on. – Alison Bennett Feb 14 '17 at 04:18
  • Without sorting, you could just do `df$Date <- as.Date(df$Date, format="%Y%m%d"); ave(as.numeric(df$Date), df$Point, FUN=order)` – thelatemail Feb 14 '17 at 04:27
  • Thanks @thelatemail - but it's just another way of sorting the data isn't it? That code doesn't add T1 etc.... – Alison Bennett Feb 14 '17 at 05:27
  • It's not sorting anything. Paste a "T" on the front and you've got your intended result. – thelatemail Feb 14 '17 at 05:49

1 Answers1

1

You could do:

df$Time <- paste0("T", ave(df$Data, df$Point, FUN=seq_along))

Output:

print(df)

  Point     Date Data Time 
1     A 20140404   10   T1
2     A 20161002    5   T2
3     B 20150217    5   T1
4     B 20170101    3   T2
5     C 20130508    2   T1
6     D 20130514    7   T1
7     E 20131024    8   T1
8     E 20150412    5   T2
9     E 20170210    6   T3

Assuming that you Date column is sorted (like what you showed in your example).

The ave function groups a FUN (is this case seq_along) over level combinations of factors. The seq_along generates regular sequences.

For more info, see the R help documentation page by doing:

  • ?ave
  • ?seq_along
lizzie
  • 606
  • 6
  • 15