1

I am working with panel data and I am looking to summarize information for each entity (ID) in different periods of time.

I am fairly new to R, but I am certain this can be done rather easily.

My data is formatted like this:

mydata <- structure(list(company_id = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 
3, 3, 3, 3, 3), A = c(3, 3, 3, 3, 4, 4, 4, 4, 4, 6, 6, 6, 6, 
6, 6, 6), B = c(3, 6, 9, 12, 4, 8, 12, 16, 20, 6, 12, 18, 24, 
30, 36, 42)), .Names = c("company_id", "A", "B"), row.names = c(NA, 
-16L), class = c("tbl_df", "tbl", "data.frame"))

# A tibble: 16 x 2
   company_id     A
        <dbl> <dbl>
 1          1     3
 2          1     3
 3          1     3
 4          1     3
 5          2     4
 6          2     4
 7          2     4
 8          2     4
 9          2     4
10          3     6
11          3     6
12          3     6
13          3     6
14          3     6
15          3     6
16          3     6

My desired output would be to add a column B such that:

╔════════════╦═══╦════╗
║ company_id ║ A ║ B  ║
╠════════════╬═══╬════╣
║          1 ║ 3 ║  3 ║
║          1 ║ 3 ║  6 ║
║          1 ║ 3 ║  9 ║
║          1 ║ 3 ║ 12 ║
║          2 ║ 4 ║  4 ║
║          2 ║ 4 ║  8 ║
║          2 ║ 4 ║ 12 ║
║          2 ║ 4 ║ 16 ║
║          2 ║ 4 ║ 20 ║
║          3 ║ 6 ║  6 ║
║          3 ║ 6 ║ 12 ║
║          3 ║ 6 ║ 18 ║
║          3 ║ 6 ║ 24 ║
║          3 ║ 6 ║ 30 ║
║          3 ║ 6 ║ 36 ║
║          3 ║ 6 ║ 42 ║
╚════════════╩═══╩════╝

What is the most efficient way of doing this?

Cettt
  • 11,460
  • 7
  • 35
  • 58
DBE7
  • 766
  • 2
  • 9
  • 23

2 Answers2

2

you can use the dplyr package:

library(dplyr)

my_data %>% 
  group_by(company_id) %>%
  mutate(B = cumsum(A))
Cettt
  • 11,460
  • 7
  • 35
  • 58
2

Use ave and cumsum. No packages are used.

transform(my_data, B = ave(A, company_id, FUN = cumsum))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341