So I got this dataframe (dataframe 1):
# A tibble: 6 x 27
# Groups: authority_dic, Full.Name [6]
authority_dic Full.Name Entity `4_2020` `3_2020` `7_2020` `12_2019` `8_2020` `5_2019` `1_2019` `4_2019` `6_2019` `8_2019` `9_2020` `10_2019` `11_2020` `9_2019`
<chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 pro A SALES 23 22 6 3 15 8 7 2 5 8 13 12 7 6
2 pro B WERNE 16 16 7 12 4 11 4 12 13 9 8 9 7 11
3 pro C EQT C 5 16 1 2 3 0 6 2 5 1 2 5 3 5
4 pro D T-MOB 7 7 16 2 8 6 4 10 4 2 6 3 1 2
5 leader A INSPE 2 1 1 15 0 9 1 7 8 4 2 8 2 6
6 pro F AXON 4 4 1 1 0 14 0 12 1 0 0 1 1 0
# ... with 10 more variables: `1_2020` <int>, `11_2019` <int>, `10_2020` <int>, `2_2020` <int>, `2_2019` <int>, `3_2019` <int>, `12_2020` <int>, `7_2019` <int>,
# `6_2020` <int>, `5_2020` <int>
And this dataframe (dataframe 2):
# A tibble: 6 x 25
# Groups: Full.Name [6]
Full.Name `1_2019` `1_2020` `2_2019` `2_2020` `3_2019` `3_2020` `4_2019` `4_2020` `5_2019` `5_2020` `6_2019` `6_2020` `7_2019` `7_2020` `8_2019` `8_2020` `9_2019`
<chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 A 556 220 577 213 628 102 608 58 547 321 429 233 371 370 338 334 409
2 B 0 48 0 0 0 0 14 45 0 0 83 0 0 16 0 0 0
3 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 D 0 0 0 0 0 110 0 63 60 125 23 0 46 32 0 47 18
5 E 73 163 68 113 122 0 70 71 135 124 62 101 28 21 98 11 109
6 F 96 135 57 32 149 173 211 145 61 46 190 308 32 159 210 168 0
# ... with 7 more variables: `9_2020` <int>, `10_2019` <int>, `10_2020` <int>, `11_2019` <int>, `11_2020` <int>, `12_2019` <int>, `12_2020` <int>
Now I want to create another dataframe (dataframe 3) and it should contain all names of the column "full name" and divide all columns values of dataframe 1 "x_20xx" through the matching value of dataframe 2. It's kind of tricky, because unfortunately, both dataframes got the exact same column names for "x_20xx". Maybe we need to change that first in order to calculate the values for the new dataframe.
This is an example of what I want: Obviously with all the "x_20xx" columns.
Full.Name `1_2019_freq`
<chr> <int>
1 A (value_1_2019 df1) / (value_1_2019 df2)
2 B (value_1_2019 df1) / (value_1_2019 df2)
3 C (value_1_2019 df1) / (value_1_2019 df2)
4 D (value_1_2019 df1) / (value_1_2019 df2)
5 E (value_1_2019 df1) / (value_1_2019 df2)
6 F (value_1_2019 df1) / (value_1_2019 df2)
It's hard to explain, but I hope you get my point. It's a simple calculation, but I am not able to group all "A" and "B" [...] and sum their columns so that there is only one row containing "A" with all the other months but summed values for each month. After that, we only need to create another new dataframe and divide the values through the matching values of df2.
Any help is appreciated.
Many thanks for considering my request.
My dput()
output:
Dataframe 1:
structure(list(authority_dic = c("pro", "pro", "pro", "pro",
"leader", "pro", "know", "pro", "pro", "pro"), Full.Name = c("Marc R. Benioff",
"Derek J. Leathers", "Robert J. McNally", "G. Michael Sievert",
"Paul J. Sarvadi", "Patrick W. Smith", "John J. Legere", "John J. Legere",
"Patrick K. Decker", "Michael J. Saylor"), Entity = c("SALESFORCE.COM INC",
"WERNER ENTERPRISES INC", "EQT CORP (Rapidan Energy)", "T-MOBILE US INC",
"INSPERITY INC", "AXON ENTERPRISE INC", "T-MOBILE US INC", "T-MOBILE US INC",
"XYLEM INC", "MICROSTRATEGY INC"), `4_2020` = c(23L, 16L, 5L,
7L, 2L, 4L, 9L, 4L, 2L, 2L), `3_2020` = c(22L, 16L, 16L, 7L,
1L, 4L, 8L, 2L, 6L, 0L), `7_2020` = c(6L, 7L, 1L, 16L, 1L, 1L,
3L, 1L, 3L, 0L), `12_2019` = c(3L, 12L, 2L, 2L, 15L, 1L, 4L,
3L, 3L, 0L), `8_2020` = c(15L, 4L, 3L, 8L, 0L, 0L, 4L, 2L, 2L,
0L), `5_2019` = c(8L, 11L, 0L, 6L, 9L, 14L, 7L, 6L, 9L, 0L),
`1_2019` = c(7L, 4L, 6L, 4L, 1L, 0L, 13L, 4L, 0L, 2L), `4_2019` = c(2L,
12L, 2L, 10L, 7L, 12L, 2L, 13L, 2L, 0L), `6_2019` = c(5L,
13L, 5L, 4L, 8L, 1L, 6L, 4L, 6L, 0L), `8_2019` = c(8L, 9L,
1L, 2L, 4L, 0L, 9L, 8L, 13L, 0L), `9_2020` = c(13L, 8L, 2L,
6L, 2L, 0L, 5L, 2L, 0L, 3L), `10_2019` = c(12L, 9L, 5L, 3L,
8L, 1L, 10L, 7L, 9L, 0L), `11_2020` = c(7L, 7L, 3L, 1L, 2L,
1L, 4L, 0L, 3L, 12L), `9_2019` = c(6L, 11L, 5L, 2L, 6L, 0L,
11L, 6L, 6L, 0L), `1_2020` = c(6L, 9L, 1L, 4L, 11L, 5L, 4L,
5L, 3L, 0L), `11_2019` = c(10L, 11L, 0L, 5L, 11L, 1L, 6L,
4L, 4L, 0L), `10_2020` = c(10L, 7L, 3L, 5L, 3L, 0L, 5L, 1L,
1L, 7L), `2_2020` = c(7L, 4L, 3L, 4L, 4L, 0L, 10L, 7L, 5L,
0L), `2_2019` = c(7L, 5L, 5L, 1L, 6L, 0L, 6L, 3L, 0L, 0L),
`3_2019` = c(4L, 9L, 1L, 6L, 6L, 3L, 6L, 10L, 4L, 0L), `12_2020` = c(3L,
10L, 3L, 5L, 0L, 2L, 2L, 1L, 3L, 10L), `7_2019` = c(7L, 9L,
4L, 2L, 6L, 0L, 3L, 8L, 1L, 0L), `6_2020` = c(3L, 3L, 1L,
3L, 1L, 2L, 2L, 2L, 8L, 0L), `5_2020` = c(2L, 7L, 0L, 7L,
4L, 2L, 3L, 4L, 1L, 0L)), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
authority_dic = c("know", "leader", "pro", "pro", "pro",
"pro", "pro", "pro", "pro", "pro"), Full.Name = c("John J. Legere",
"Paul J. Sarvadi", "Derek J. Leathers", "G. Michael Sievert",
"John J. Legere", "Marc R. Benioff", "Michael J. Saylor",
"Patrick K. Decker", "Patrick W. Smith", "Robert J. McNally"
), .rows = structure(list(7L, 5L, 2L, 4L, 8L, 1L, 10L, 9L,
6L, 3L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), .drop = TRUE))
Dataframe 2:
structure(list(Full.Name = c("A. Patrick Beharelle", "Aaron P. Graft",
"Aaron P. Jagdfeld", "Adam H. Schechter", "Adam P. Symson", "Adena T. Friedman",
"John J. Legere", "Albert Bourla, D.V.M., DVM, Ph.D.",
"Andres Ricardo Gluski Weilert", "Andrew Anagnost"), `1_2019` = c(556L,
0L, 0L, 0L, 73L, 96L, 0L, 0L, 40L, 222L), `1_2020` = c(220L,
48L, 0L, 0L, 163L, 135L, 0L, 209L, 0L, 110L), `2_2019` = c(577L,
0L, 0L, 0L, 68L, 57L, 0L, 0L, 98L, 215L), `2_2020` = c(213L,
0L, 0L, 0L, 113L, 32L, 0L, 66L, 0L, 133L), `3_2019` = c(628L,
0L, 0L, 0L, 122L, 149L, 0L, 0L, 8L, 278L), `3_2020` = c(102L,
0L, 0L, 110L, 0L, 173L, 0L, 243L, 0L, 120L), `4_2019` = c(608L,
14L, 0L, 0L, 70L, 211L, 0L, 0L, 18L, 200L), `4_2020` = c(58L,
45L, 0L, 63L, 71L, 145L, 0L, 315L, 0L, 111L), `5_2019` = c(547L,
0L, 0L, 60L, 135L, 61L, 0L, 0L, 68L, 101L), `5_2020` = c(321L,
0L, 0L, 125L, 124L, 46L, 0L, 100L, 0L, 166L), `6_2019` = c(429L,
83L, 0L, 23L, 62L, 190L, 0L, 0L, 150L, 242L), `6_2020` = c(233L,
0L, 0L, 0L, 101L, 308L, 112L, 128L, 0L, 201L), `7_2019` = c(371L,
0L, 0L, 46L, 28L, 32L, 0L, 111L, 0L, 333L), `7_2020` = c(370L,
16L, 0L, 32L, 21L, 159L, 0L, 199L, 43L, 15L), `8_2019` = c(338L,
0L, 0L, 0L, 98L, 210L, 0L, 56L, 0L, 212L), `8_2020` = c(334L,
0L, 0L, 47L, 11L, 168L, 0L, 207L, 0L, 40L), `9_2019` = c(409L,
0L, 0L, 18L, 109L, 0L, 0L, 73L, 0L, 387L), `9_2020` = c(406L,
91L, 5L, 0L, 40L, 164L, 23L, 183L, 65L, 28L), `10_2019` = c(330L,
35L, 0L, 0L, 63L, 119L, 0L, 192L, 34L, 88L), `10_2020` = c(331L,
0L, 0L, 0L, 25L, 185L, 71L, 451L, 43L, 29L), `11_2019` = c(448L,
33L, 0L, 0L, 131L, 108L, 0L, 41L, 35L, 196L), `11_2020` = c(559L,
0L, 0L, 44L, 83L, 86L, 49L, 803L, 0L, 132L), `12_2019` = c(300L,
0L, 4L, 0L, 72L, 167L, 0L, 155L, 68L, 93L), `12_2020` = c(122L,
0L, 0L, 0L, 0L, 54L, 0L, 790L, 0L, 1L)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
Full.Name = c("A. Patrick Beharelle", "Aaron P. Graft", "Aaron P. Jagdfeld",
"Adam H. Schechter", "Adam P. Symson", "Adena T. Friedman",
"Alan David Schnitzer", "Albert Bourla, D.V.M., DVM, Ph.D.",
"Andres Ricardo Gluski Weilert", "Andrew Anagnost"), .rows = structure(list(
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), .drop = TRUE))
Take "John J. Legere" as an example.