0

I am looking to calculate relative age of animals. I need to subtract sequentially each year from the next for each animal in my dataset. Because an animal can have multiple reproductive events in a year, I need the age for the remaining events in that year (i.e. all events after the first) to be the same as the initial calculation.

Update:

The dataset more resembles this:

  Year ID Age 

1 1975 6  -1   
2 1975 6  -1   
3 1976 6  -1   
4 1977 6  -1   
6 1975 9  -1   
8 1978 9  -1 

And I need it to look like this

  Year ID Age 

1 1975 6  0   
2 1975 6  0   
3 1976 6  1   
4 1977 6  2   
6 1975 9  0   
8 1978 9  3 

Apologies for the initial confusion, if I wasn't clear on what I needed to accomplish.

Any help would be greatly appreciated.

smci
  • 32,567
  • 20
  • 113
  • 146
Constantin
  • 132
  • 9
  • Read about the [tag:split-apply-combine] paradigm. Whenever you have data with multiple rows for specific ID (i.e. long-form). – smci Jan 26 '18 at 21:12
  • Will do, thank you for the help. – Constantin Jan 26 '18 at 21:18
  • ..and look at the introductions for `dplyr` and/or `data.table` packages (and later on, `tidyverse`). This is one of the biggest paradigms in R. It's incredibly powerful. – smci Jan 26 '18 at 21:19

2 Answers2

3

Things done "by group" are usually easiest to do using dplyr or data.table

library(dplyr)
your_data %>%
  group_by(ID) %>%               # group by ID
  mutate(Age = Year - min(Year)) # add new column

or

library(data.table)
setDT(your_data) # convert to data table

            # add new column         by group
your_data[, Age := Year - min(Year), by = ID]

In base R, ave is probably easiest for adding a groupwise columns to existing data:

your_data$Age = with(your_data, ave(Year, ID, function(x) x - min(x)))

but the syntax isn't as nice as the options above.


You can test on this data:

your_data = read.table(text = "  Year ID Age 
1 1975 6  -1   
2 1975 6  -1   
3 1976 6  -1   
4 1977 6  -1   
6 1975 9  -1   
8 1978 9  -1 ", header = T)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • I get this error message in return when trying to run library(dplyr) your_data %>% group_by(ID) %>% # group by ID mutate(Age = Year - min(Year)) # add new column Error in mutate_impl(.data, dots) : Column `Age` must be length 84 (the group size) or one, not 7664 – Constantin Jan 26 '18 at 20:43
  • Well, it works on the sample data. (You can get that to work, right? Try the data at the bottom of my question.) So what's different about your data? Are the column names spelled the same way? Are you sure `Age` is capitalized? Do you have another object called `Age` in your workspace, that has length 7664? – Gregor Thomas Jan 26 '18 at 20:51
  • And please don't tell me you used `attach`. If so, `detach()` immediately and never use `attach()` again. – Gregor Thomas Jan 26 '18 at 20:51
  • Worked perfectly sorry for the confusion. Thank you very much. – Constantin Jan 26 '18 at 21:04
  • Also, what is your aversion to attach()? – Constantin Jan 26 '18 at 21:05
  • `attach` causes more problems than it solves. It makes it possible for columns to get out of sync with each other, and out of sync with the data frame. I wrote a little bit more [in my answer here](https://stackoverflow.com/a/42284422/903061). Even if you look at the help page `?attach`, the *Good Practice* section essentially says *"`with` is usually better - and don't use it inside functions!"*. – Gregor Thomas Jan 26 '18 at 21:21
1

if you're trying to figure out the relative age based on one intial birth year, 1975 (which it seems like you are), then you can just make a new column called "RelativeAge" and set it equal to the year - 1975

data$RelativeAge = (Year-1975)

then just get rid of the original "Age" column, or rename as necessary

Sean
  • 21
  • 4
  • Welcome to the site! OPs example isn't super clear, but I do think they want to do this operation for each ID value in the data frame, which may have different Year values. – Gregor Thomas Jan 26 '18 at 20:40