Now I have two large excel file with ~100000 rows with the same column name. I would like to compare them row by row to see if they are the same. Is there any tools that I can use? e.g. R or
-
Does this answer your question? [Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2](https://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in) – Vinícius Félix Sep 07 '21 at 23:47
-
It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Sep 08 '21 at 01:56
1 Answers
Copy the column in each spreadsheet to a new tab in that spreadsheet. Then with that single column tab active, export that data to a .txt or .csv file so you have 2 .txt or .csv files on disk and then use a comparison tool to compare the two files. On a Linux system you can use diff. On windows, fc. https://www.windowscentral.com/how-compare-two-files-command-prompt-windows-10
For example, I took 2 copies of spelling dictionary from Linux /usr/share/dict/words and made words1.txt and words2.txt. I made changes to words2.txt. Each file is over 100000 rows long. On Linux, the diff tool comparison looks like this:
user1@debian10 /media/sf_vshare/tmp10 > diff words1.txt words2.txt
50000a50001
> TESTING HERE
69998a70000
> TESTING HERE2
On Windows10, the comparison looks like this:
Another way is to upload the excel files to google sheets, and use tools there: https://www.ablebits.com/office-addins-blog/2019/04/30/google-sheets-compare-two-sheets-columns/

- 306
- 2
- 3