How to join combining table values without unique values added to the bottom in R code? Full_join is adding new values to the bottom

Question

I need a chart of accounts to stay in order when new accounts are added or dropped in future years. This is because in Accounting the accounts are sorted by type (for example Asset, Liability Equity) but it is not explicit in the dataset. This is an example of the code that is putting new "Accounts" from Year2 and Year3 at the bottom.

XYZCompany_Consolidated <- XYZCompany_Year1 %>%
  full_join(XYZCompany_Year2 by = "Account") %>%
  full_join(XYZCompany_Year3, by = "Account")

Example: This picture is just to give a simplified example. The highlight in orange is where the new accounts are going and to the right is the code i'm using, and the green is what I'm trying to achieve

pic of multiple tables

Please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Martin Gal, Jul 20 '21 at 21:32
Welcome to SO, AmyCorona! Please don't post images of data, most will not spend the time to transcribe your data into something usable. The gold-standard is to paste the output from `dput(x)` for each frame, where `x` is a sufficient sample to get the point across. Some good links for making this fully reproducible include MartinGal's link plus [mcve] and https://stackoverflow.com/tags/r/info. Thanks! — r2evans, Jul 20 '21 at 21:33

Martin Gal · Answer 1 · 2021-07-20T22:23:18.197

Perhaps I'm overthinking this problem but I find it hard to solve. Let's define some data first:

df_year1 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto", 
"JaneDoe"), Year_1 = c(100, 1000, 20, 300, 500)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), spec = structure(list(
    cols = list(Account = structure(list(), class = c("collector_character", 
    "collector")), Year_1 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

df_year2 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto", 
"Laptop", "JaneDoe"), Year_2 = c(80, 1200, 50, 300, 500, 0)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), spec = structure(list(
    cols = list(Account = structure(list(), class = c("collector_character", 
    "collector")), Year_2 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

df_year3 <- structure(list(Account = c("Cash", "Accounts", "Loan1", "Auto", 
"Rent", "JaneDoe"), Year_3 = c(80, 1200, 50, 300, 1000, 0)), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), spec = structure(list(
    cols = list(Account = structure(list(), class = c("collector_character", 
    "collector")), Year_3 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

Those are similar to the data shown in the OP's picture, e.g. df_year1 looks like

# A tibble: 5 x 2
  Account  Year_1
  <chr>     <dbl>
1 Cash        100
2 Accounts   1000
3 Loan1        20
4 Auto        300
5 JaneDoe     500

Next we transform those data a little bit, namely

library(dplyr)
library(tidyr)

df_y1 <- df_year1 %>% 
  mutate(Year = 1, 
         no = row_number()) %>% 
  rename(value = Year_1)

which returns

# A tibble: 5 x 4
  Account  value  Year    no
  <chr>    <dbl> <dbl> <int>
1 Cash       100     1     1
2 Accounts  1000     1     2
3 Loan1       20     1     3
4 Auto       300     1     4
5 JaneDoe    500     1     5

The new column no stores the account's original position, column Year stores the chart's year. All three data.frames are processed like this, so we get df_y1, df_y2, df_y3.

Finally we bind them together

bind_rows(df_y1, df_y2, df_y3) %>% 
  mutate(num_years = max(Year)) %>% 
  group_by(Account) %>% 
  mutate(rank = sum((num_years - n() + 1) * no), .keep = "unused") %>% 
  pivot_wider(names_from = Year) %>% 
  arrange(rank) %>% 
  select(-rank) %>%
  ungroup()

and calculate a rank for each account. The accounts are ordered by this rank. As a result, we get

# A tibble: 7 x 4
  Account  Year_1 Year_2 Year_3
  <chr>     <dbl>  <dbl>  <dbl>
1 Cash        100     80     80
2 Accounts   1000   1200   1200
3 Loan1        20     50     50
4 Auto        300    300    300
5 Laptop       NA    500     NA
6 Rent         NA     NA   1000
7 JaneDoe     500      0      0

Note

I believe, there are better approaches, but at least this works for the example data.
I'm not sure about the calculated rank's stability. Take care.

How to join combining table values without unique values added to the bottom in R code? Full_join is adding new values to the bottom

1 Answers1