0

I have 2 dataframes. One is a more up to date inventory list (Product, Quantity), and one is an older inventory list.

I am trying to figure out the most efficient way to produce a list of all new product (product that is in the new list but not the old one. This can either be a brand new product, or an addition to the "Quantity" variable with the same Product).

I am also looking for the opposite - essentially trying to produce a list of all "lost" product (product that is completely removed or quantity goes down)

I tried doing some anti joins/inner joins with no luck. Anyone have any suggestions on an efficient way to do this in R?

Datasets:

n<-6
new <- data.frame(quantity=1:n, 
              product=rep(LETTERS[1:6], n/6)
)

n<-4
old <- data.frame(quantity=1:n, 
              product=rep(LETTERS[1:4], n/4)
)

here, the result would be 5/E and 6/F as new products, and nothing in the lost products

mrmiawmiaw
  • 11
  • 1
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Mar 29 '22 at 04:40
  • Maybe a function like [arsenal::comparedf](https://cran.r-project.org/web/packages/arsenal/vignettes/comparedf.html) can help. – MrFlick Mar 29 '22 at 04:57
  • 1
    A starting point might be using `setdiff`. Something like `setdiff(new$product, old$product)` would show you the items that are in the new list, but not in the old list (note the order of the lists in the function is important) – thehand0 Mar 29 '22 at 06:30

1 Answers1

1

You really want set operations like setdiff, union, intersect, and setequal. As you are looking for "most efficient": You might consider the data.table versions: https://rdrr.io/cran/data.table/man/setops.html

A list of all the different products can be gained via setdiff:

> setdiff(new$product, old$product)
[1] "E" "F"

And to look up, which are new and which are lost, you can use %in%:

sd <- setdiff(new$product, old$product)
which(sd %in% new$product)
which(sd %in% old$product)
Bernhard
  • 4,272
  • 1
  • 13
  • 23