0

Now I have two large excel file with ~100000 rows with the same column name. I would like to compare them row by row to see if they are the same. Is there any tools that I can use? e.g. R or

Eric
  • 11
  • 4
  • Does this answer your question? [Compare two data.frames to find the rows in data.frame 1 that are not present in data.frame 2](https://stackoverflow.com/questions/3171426/compare-two-data-frames-to-find-the-rows-in-data-frame-1-that-are-not-present-in) – Vinícius Félix Sep 07 '21 at 23:47
  • It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Sep 08 '21 at 01:56

1 Answers1

1

Copy the column in each spreadsheet to a new tab in that spreadsheet. Then with that single column tab active, export that data to a .txt or .csv file so you have 2 .txt or .csv files on disk and then use a comparison tool to compare the two files. On a Linux system you can use diff. On windows, fc. https://www.windowscentral.com/how-compare-two-files-command-prompt-windows-10

For example, I took 2 copies of spelling dictionary from Linux /usr/share/dict/words and made words1.txt and words2.txt. I made changes to words2.txt. Each file is over 100000 rows long. On Linux, the diff tool comparison looks like this:

user1@debian10 /media/sf_vshare/tmp10 > diff words1.txt words2.txt 
50000a50001
> TESTING HERE
69998a70000
> TESTING HERE2

On Windows10, the comparison looks like this:

enter image description here

Another way is to upload the excel files to google sheets, and use tools there: https://www.ablebits.com/office-addins-blog/2019/04/30/google-sheets-compare-two-sheets-columns/

Philip Wright
  • 306
  • 2
  • 3