0

I have two large dataset CSV file,

  1. 1st file -> shop.csv has fields item_number, vendor
  2. 2nd file -> item.csv has fileds item_number, price

Each file is size 8GB.

Now i need to find the relationship between the vendor and the price for the given item_number. Is there any tool that helps deal with such dataset files?

DSi
  • 93
  • 6
  • 2
    Maybe you can import the CSVs into a database (e.g. SQLite) and do the operations there? – Andrej Kesely Oct 02 '22 at 18:57
  • 1
    A pretty simple example on how to load a CSV in SQLite is given in this answer: https://stackoverflow.com/a/61364106/724039 – Luuk Oct 02 '22 at 19:25

1 Answers1

-1

Using an editor or a database that requires you to load the files before they can be operated on will be a very slow process, if it works at all.

Your best approach is to use a line processing tool like grep. First find the item_number in one file and then search for it in the other file.

E.g. grep "Olly\'s\ Grocer" shop.csv

Phillip Ngan
  • 15,482
  • 8
  • 63
  • 79
  • It is very unclear from this answer why loading two CSV files in a database can be "a very slow process, if it works at all". Clearly there's not enough knowledge about i.e. SQLite, or any other DBMS that can import CSV files. If this is a one-time search action, then maybe `grep` or `awk` can be an option. – Luuk Oct 02 '22 at 19:20