Comparing files off first x number of characters

Question

I have two text files that both have data that look like this:

Mon-000101,100.27242,9.608597,11.082,10.034,0.39,I,0.39,I,31.1,31.1,,double with 1355,,,,,,,,
Mon-000171,100.2923,9.52286,14.834,14.385,0.45,I,0.45,I,33.7,33.7,,,,,,,,,,
Mon-000174,100.27621,9.563802,11.605,10.134,0.95,I,1.29,I,30.8,30.8,,,,,,,,,,

I want to compare the two files based off of the Mon-000101(as an example of one ID) characters to see where they differ. I tried some diff commands that I found in another question, which didn't work. I'm out of ideas so I'm turning to anybody with more experience than myself.

Thanks.

HazMatt:Desktop m$ diff NGC2264_classI_h7_notes.csv /Users/m/Downloads/allfitclassesI.txt 
1c1
Mon-000399,100.25794,9.877631,12.732,12.579,0.94,I,-1.13,I,9.8,9.8,,"10,000dn vs 600dn brighter source at 6 to 12"" Mon-000402,100.27347,9.59Mon-146053,100.23425,9.571719,12.765,11.39,1.11,I,1.04,I,16.8,16.8,,"double 3"" confused with 411, appears brighter",,,,,,,,
\ No newline at end of file
---
Mon-146599                    Mon-146599   4.54      I   4.54      III
\ No newline at end of file

This was my attempt and the output. The thing is, is that I know the files differ by eleven lines...corresponding to eleven mismatched values. I don't want to do this by hand (who would?). Maybe I'm misreading the diff output. But I'd expect more than this.

score 1 · Answer 1 · answered Oct 30 '13 at 06:16

1

Have you tried :

diff `cat file_1 | grep Mon-00010` `cat file_2 | grep Mon-00010`

answered Oct 30 '13 at 06:16

Giuseppe Pes

7,772
3
52
90

`Mon-00010` was a 'dummy variable' I want it to search to find the differences with all of the names (they are ID's). – Matt Oct 30 '13 at 06:17
Sorry, but then I don't understand what you are trying to achieve. The grep narrows down the file to only those lines that contain the key you re searching for and the compare them. – Giuseppe Pes Oct 30 '13 at 06:21
I'm not trying to find a single key, I'm trying to find the differences in two files that contain a list of ID numbers. `Mon-000101` was just an example of one such ID. – Matt Oct 30 '13 at 06:22

score 0 · Answer 2 · answered Oct 30 '13 at 06:43

0

First sort both the files and then try using diff

sort file1 > file1_sorted
sort file2 > file2_sorted
diff file1_sorted file2_sorted

Sorting will help arranging both the files as per first ID field, so that you don't get unwanted mismatches.

answered Oct 30 '13 at 06:43

jkshah

11,387
6
35
45

score 0 · Answer 3 · answered Oct 30 '13 at 21:45

I am not sure what you are searching, but I'll try to help. Otherwise you could give some examples of input files and desired output.

My input-files are:

prompt> cat in1.txt 
Mon-000101,100.27242,9.608597,11.082,10.034,0.39,I,0.39,I,31.1,31.1,,double with 1355,,,,,,,,
Mon-000171,100.2923,9.52286,14.834,14.385,0.45,I,0.45,I,33.7,33.7,,,,,,,,,,
Mon-000174,100.27621,9.563802,11.605,10.134,0.95,I,1.29,I,30.8,30.8,,,,,,,,,

and

prompt> cat in2.txt 
Mon-000101,111.27242,9.608597,11.082,10.034,0.39,I,0.39,I,31.1,31.1,,double with 1355,,,,,,,,
Mon-000172,100.2923,9.52286,14.834,14.385,0.45,I,0.45,I,33.7,33.7,,,,,,,,,,
Mon-000174,122.27621,9.563802,11.605,10.134,0.95,I,1.29,I,30.8,30.8,,,,,,,,,,

If you are just interested in the "ID" (whatever that means) you have to seperate it. I assume the ID is the tag before the first comma, so it is possible to cut everything except the ID and compare:

prompt> diff <(cut -d',' -f1 in1.txt) <(cut -d',' -f1 in2.txt)
2c2
< Mon-000171
---
> Mon-000172

If the ID is more complicated you can grep with the use of regular expressions.

Additionally diff -y gives you a little graphical output of which lines are differing. You can use this to merely compare the complete file or use it with the cutting explained before:

prompt> diff -y <(cut -d',' -f1 in1.txt) <(cut -d',' -f1 in2.txt)
Mon-000101                          Mon-000101
Mon-000171                            | Mon-000172
Mon-000174                          Mon-000174

Comparing files off first x number of characters

3 Answers3