0

I have multiple dataframes with different column names in each dataframe and would like to do a random match by taking some values from each column, and match it to all columns in my list of dataframes. The purpose behind this is to identify which columns are linked (to allow easier merging after).

Does someone know a way to do this in R?

sales <- data.frame(r1 = c(10, 10.5, 30.1), r2 = c("ID1","ID2","ID3"))
purchases <- data.frame(cost = c(29.9, 11.5, 33.1), ID = c("ID1","ID2","ID3"), product_id = c("X1", "X2", "X3"))
product <- data.frame(admin_ID = c("X1", "X2", "X3"), name = c("ID1","ID2","ID3"))

From the data, you can see that Sales:r2 = purchases:ID = product:name AND Sales:product_id = product:admin_ID.

The match should only be performed on character variables.

Javier
  • 730
  • 4
  • 17
  • Can you please provide an example of the output you'r looking for? I cannot discern what it is you're specifically looking to get. – Cyrus Mohammadian Dec 18 '18 at 07:54
  • How would you match [floating-point values](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal)? Should the match be between character variables only? – Rui Barradas Dec 18 '18 at 07:54
  • @RuiBarradas, it will only be a match between character variables! – Javier Dec 18 '18 at 07:57
  • @CyrusMohammadian hmm, im not too sure how the output would look like, but would like to easily see the relationship between those columns that are linked to at least one other column from different tables – Javier Dec 18 '18 at 07:57
  • `anyDuplicated(sales$r2)` will show if `sales$r2` can be used as a primary key. – Cyrus Mohammadian Dec 18 '18 at 08:30

0 Answers0