0

I have two tables. The columns i am interested in table 1 is "Year" and "CompanyName". Table 2 has 3 columns including: "Year" and "CompanyName".

How can I join these two tables together? The problem I have is that table 1 has many columns that have for example the year value as "Year" = "2004" and "CompanyName" = "Adidas". e.g.

# There are many other columns
   Year   CompanyName   Spent
1  2004   Adidas        50
2  2004   Nike          34
3  2004   Adidas        45
4  2005   Reebok        33
5  2006   Reebok        11
6  2006   Adidas        47
7  2007   Nike          33
8  2007   Reebok        92
9  2007   Nike          01
10 2007   Adidas        23

#I want to join this to it
   Year  CompanyName   Loss
1  2004  Nike          23
2  2004  Adidas        22
3  2005  Reebok        633
4  2006  Reebok        2
5  2006  Adidas        09
6  2007  Reebok        22
7  2007  Nike          34

I want to join the tables so when ever Year is 2004 and CompanyName is Adidas a column is added for Loss with the value 23

Thank You!

  • `full_join(df1, df2, by = c("Year", "CompanyName"))` (or base R, `merge(df1, df2, by = c("Year", "CompanyName"), all = TRUE)`). – r2evans Sep 03 '20 at 14:02

1 Answers1

0

You can do that by

library(dplyr)

df3 <- df1 %>%
  left_join(df2, by = c("Year", "CompanyName"))

Just make sure you don't have duplications in df2 when it comes to year & company name. You can do so through dplyr::distinct(df2, Year, CompanyName, .keep_all = T), however that might lead to dropping some relevant information. If you're not certain about it, it might make sense to aggregate by those two dimensions:

df2 %>%
  group_by(Year, CompanyName) %>%
  summarise(Loss = sum(Loss))
alex_jwb90
  • 1,663
  • 1
  • 11
  • 20
  • I get this warning, I assume because the names of the companies do not come in the same order. How would I fix it? Warning message: Column `CompanyName` joining factors with different levels, coercing to character vector – user123231322 Sep 03 '20 at 14:13
  • Yeah, because of the different order the factor vectors have a different order of their levels. Its been fixed for you by dplyr by changing the column type to character. So, nothing you need to do, unless you want to change the column back to factor, which of course you can – alex_jwb90 Sep 03 '20 at 14:49