2

I have plain text college transcripts... these are meant to used a fixed-width font and have ascii columns and so forth. It's difficult for me to notice visually if a change to the program that generates them has introduced defects, my eyes glaze over. However, each time you generate even the same document, a timestamp within it will change. Is there a way to force diff (or a similar tool) to ignore a particular line, or even a particular range of characters within that line?

If I use the -I "regexp" switch, this will ignore differences in the entire line, even though I only wish to ignore changes to the date.

Is there a better tool for this? Can diff be made to do this with some bash fu?

John O
  • 4,863
  • 8
  • 45
  • 78

2 Answers2

1

Check out his webpage, maybe it will do what you want. http://ifdeflinux.blogspot.com/2011/08/diff-two-files-ignoring-certain-fields.html

Paul

pcantalupo
  • 2,212
  • 17
  • 27
1

You can use process substitution for this, passing the output of some other commands in place of the files themselves. Example:

$ diff file1 file2
2c2
< example: 123
---
> example: 456

$ diff <(sed -r 's/(.*: )[0-9]+/\1/' file1) <(sed -r 's/(.*: )[0-9]+/\1/' file2)
[files are the same]

In the example the sed command is just removing any digits after a colon - you will need to come up with a command appropriate for your input data which strips out your timestamps.

Josh Jolly
  • 11,258
  • 2
  • 39
  • 55