0

I have a data frame containing panel data with patent and economic information for the years 2012-2020. The data variables are:

investment_year a time invariant variable, which is the year in which a certain company has received an initial investment;

patent_applications is the annual number of patents filed by a certain company. Company A, for example, filed five patents in 2018, two in 2019, and so on.

company_name    investment_year        year       patent_applications
A                    2018               2020             7
A                    2018               2019             2
A                    2018               2018             5
.                     .                   .              .
.                     .                   .              . 
.                     .                   .              .
A                    2018               2012             4 
B                    2015               2020             10
B                    2015               2019             3
B                    2015               2018             7
.                      .                  .              .
.                      .                  .              .
.                      .                  .              .

I would like to create a variable that contains the number of applications at t+2, where t is the investment year.

So, for example, for Company A the number of applications at t+2 (eg.patent_applications_t2) would be 7, as its investment year (2018) + 2 equals 2020.

I tried the line of code below, but it does not produce the correct result.

df$patent_applications_t2 <- df$patent_applications[df$Year == df$Investment_Year + 2]
Rfanatic
  • 2,224
  • 1
  • 5
  • 21
  • 1
    Welcome to stack overflow. It's easier to help you if you make your question reproducible by including data to enable testing and verification of possible solutions. [Link for guidance on asking questions](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Peter Nov 25 '21 at 14:01
  • See the answer here: https://stackoverflow.com/questions/69989671/how-to-divide-a-value-by-the-same-value-in-the-previous-year-in-r-return-calcu (and use `lag(x, 2)`). – Helix123 Nov 25 '21 at 14:19

1 Answers1

0

There must be a better way to accomplish what you are looking for. I got the following.

library(tidyverse)

tbl <- tribble(~company_name,    ~investment_year,        ~year,       ~patent_applications,
                "A",                    2018,             2020,             7,
                "A",                    2018,             2019,             2,
                "A",                    2018,             2018,             5,
               "A",                    2018,               2012,             4, 
               "B",                    2015,               2020,             10,
               "B",                    2015,               2019,             3,
               "B",                    2015,               2018,             7
)

tbl %>% group_by(company_name) %>%
  arrange(investment_year,year) %>%
  mutate(t2 = ifelse(year - investment_year <= 1 & year - investment_year >=0, 1, 0)) %>%
  mutate(cumulative_application = t2*cumsum(patent_applications*t2)) %>%
  ungroup() %>%
  arrange(company_name) %>%
  select(company_name,investment_year,year,patent_applications,cumulative_application)

you get this result:

# A tibble: 7 x 5
  company_name investment_year  year patent_applications cumulative_application
  <chr>                  <dbl> <dbl>               <dbl>                  <dbl>
1 A                       2018  2012                   4                      0
2 A                       2018  2018                   5                      5
3 A                       2018  2019                   2                      7
4 A                       2018  2020                   7                      0
5 B                       2015  2018                   7                      0
6 B                       2015  2019                   3                      0
7 B                       2015  2020                  10                      0

I chose to show the cumulative application but you can easily only show the second entry only.

Another solution (probably better) would be to create a function using within(). Hope this helps you a bit.