0

I have a huge dataframe and need to add a new variable based on the value of three others. The new variable has to be numeric and depends on the variables "Compartment", "Plot" and "Date". In every compartment, I will number the dates for plot x say 1:10 (if ten dates), the dates of plot y 11:20 (of also ten dates), those of plot z 21:25 (if 5 dates) and so on. Normally the dates are the same for every plot withing each compartment, but exceptions occur.

So I need a single numeric value for each plot-date combination and they need to be in chronological order for every plot.

This post: R code: how to generate variable based on multiple conditions from other variables gives a solution on how to create a variable based on conditions of other variables, but if I have to retype this for every combination in every df, it'll take me days and a massive amount of code.

Is there a generic way of solving this? With a loop or something? Thus far I couldn't think of anything better then splitting the df in df's per plot and linking the new variable to date with ifelse (in ifelse in ifelse ...). And linking them again afterwards. But this is impossible for the amount of data I have.

I did already split the large df per compartment though, should that help for certain solutions.

Dummy code (NOTE all compartments have different plot names in the real data and dates sometimes differ between compartments and even plots, as does the no. of observations per combo):

# Dataframe
Comp <- rep(c("A","B","C"), each=20)
Date <- rep(rep(c("2018-01-01", "2018-01-02", "2018-01-03", "2018-01-04", "2018-01-05"), times=4),times=3)
Plot <- rep(rep(c("P1", "P2", "P3", "P4"), each=5),times=3)
df <- data.frame(Comp, Date, Plot)

# Expected result
         Comp       Date Plot T
1     A 2018-01-01   P1  1
2     A 2018-01-02   P1  2
3     A 2018-01-03   P1  3
4     A 2018-01-04   P1  4
5     A 2018-01-05   P1  5
6     A 2018-01-01   P2  6
7     A 2018-01-02   P2  7
8     A 2018-01-03   P2  8
9     A 2018-01-04   P2  9
10    A 2018-01-05   P2  10
11    A 2018-01-01   P3  11
12    A 2018-01-02   P3  12
13    A 2018-01-03   P3  13
14    A 2018-01-04   P3  14
15    A 2018-01-05   P3  15
16    A 2018-01-01   P4  16
17    A 2018-01-02   P4  17
18    A 2018-01-03   P4  18
19    A 2018-01-04   P4  19
20    A 2018-01-05   P4  20
21    B 2018-01-01   P1  1
22    B 2018-01-02   P1  2
23    B 2018-01-03   P1  3
24    B 2018-01-04   P1  4
25    B 2018-01-05   P1  5
26    B 2018-01-01   P2  6
27    B 2018-01-02   P2  7
28    B 2018-01-03   P2  8
29    B 2018-01-04   P2  9
30    B 2018-01-05   P2  10
31    B 2018-01-01   P3  11
32    B 2018-01-02   P3  12
33    B 2018-01-03   P3  13
34    B 2018-01-04   P3  14
35    B 2018-01-05   P3  15
36    B 2018-01-01   P4  16
37    B 2018-01-02   P4  17
38    B 2018-01-03   P4  18
39    B 2018-01-04   P4  19
40    B 2018-01-05   P4  20
41    C 2018-01-01   P1  1
42    C 2018-01-02   P1  2
43    C 2018-01-03   P1  3
44    C 2018-01-04   P1  4
45    C 2018-01-05   P1  5
46    C 2018-01-01   P2  6
47    C 2018-01-02   P2  7
48    C 2018-01-03   P2  8
49    C 2018-01-04   P2  9
50    C 2018-01-05   P2  10
51    C 2018-01-01   P3  11
52    C 2018-01-02   P3  12
53    C 2018-01-03   P3  13
54    C 2018-01-04   P3  14
55    C 2018-01-05   P3  15
56    C 2018-01-01   P4  16
57    C 2018-01-02   P4  17
58    C 2018-01-03   P4  18
59    C 2018-01-04   P4  19
60    C 2018-01-05   P4  20
Tingolfin
  • 825
  • 8
  • 28
  • Can you post the expected output for your example dataset? I am seeing several ways of doing this, for instance with `ave`. – Rui Barradas Mar 13 '19 at 11:52
  • I've added an example. also changed the dummy data a bit as they were in confusing order – Tingolfin Mar 13 '19 at 12:03
  • 1
    I believe this is a duplicate but don't have the time to search for it right now. Try `with(df, ave(as.integer(Comp), Comp, FUN = seq_along))`. – Rui Barradas Mar 13 '19 at 12:49
  • @RuiBarradas this works indeed, thank you! I've posted your solution in an answer so I can check it. If you prefer to do this yourself, feel free and I will check that as accepted answer. If this is indeed a duplicate, someone can refer to the other question. I haven't found anything related yet (but it's difficult to find ideal search terms here). – Tingolfin Mar 13 '19 at 13:10
  • I have now checked it and will close as dupe. Thanks anyway. – Rui Barradas Mar 13 '19 at 14:35

2 Answers2

0

When creating your df, use stringsAsFactors = FALSE so as to not deal with factors.

df <- data.frame(Comp, Date, Plot,stringsAsFactors=FALSE)
df$z=as.numeric(as.factor(paste(df$Date,df$Plot,sep="#")))

> head(df,25)
   Comp       Date Plot  z
1     A 2018-01-01   P1  1
2     A 2018-01-01   P2  2
3     A 2018-01-01   P3  3
4     A 2018-01-01   P4  4
5     A 2018-01-02   P1  5
6     A 2018-01-02   P2  6
7     A 2018-01-02   P3  7
8     A 2018-01-02   P4  8
9     A 2018-01-03   P1  9
10    A 2018-01-03   P2 10
11    A 2018-01-03   P3 11
12    A 2018-01-03   P4 12
13    A 2018-01-04   P1 13
14    A 2018-01-04   P2 14
15    A 2018-01-04   P3 15
16    A 2018-01-04   P4 16
17    A 2018-01-05   P1 17
18    A 2018-01-05   P2 18
19    A 2018-01-05   P3 19
20    A 2018-01-05   P4 20
21    B 2018-01-01   P1  1
22    B 2018-01-01   P2  2
23    B 2018-01-01   P3  3
24    B 2018-01-01   P4  4
25    B 2018-01-02   P1  5

First we generate a new variable which pastes Date and Plot columns with a random (the rarer the better) separator (#). We then take advantage of the as.numeric(as.factor() combination, which first groups the new variable as a factor and then assigns a number to each level.

boski
  • 2,437
  • 1
  • 14
  • 30
  • I get all unique numbers when I try this, in an order in which I don't see the logic. Also the numbers had to count up per plot and then per date, but that's my mistake as I made the dummy dataset wrong in the first place. This has been edited now. Rui Barradas' answer works, so I can work on. Thanks anyway! – Tingolfin Mar 13 '19 at 13:09
0

@Rui Barradas had the answer with a very simple line of code:

df$new <- with(df, ave(as.integer(Comp), Comp, FUN = seq_along))
Tingolfin
  • 825
  • 8
  • 28