An easy way to diff log files, ignoring the time stamps?

Question

I need to diff two log files but ignore the time stamp part of each line (the first 12 characters to be exact). Is there a good tool, or a clever awk command, that could help me out?

score 56 · Accepted Answer · edited May 23 '17 at 12:02

56

Depending on the shell you are using, you can turn the approach @Blair suggested into a 1-liner

diff <(cut -b13- file1) <(cut -b13- file2)

(+1 to @Blair for the original suggestion :-)

edited May 23 '17 at 12:02

Community

1
1

answered Sep 04 '08 at 15:44

toolkit

49,809
17
109
135

2

Any suggestions if one wants to keep the timestamps in the output but want to ignore differences in the timestamps themselves? http://stackoverflow.com/questions/14326476/how-to-diff-parts-of-lines#comment19909398_14326476 – Noel Yap Jan 14 '13 at 21:12
When I use this to diff the files in two directories, the names of the files are omitted in the output (if I just do `diff dir1 dir2` the names are included). Why? Can this be fixed? – d-b Jun 12 '23 at 13:22

Blair Conrad · Answer 2 · 2008-09-06T11:38:28.070

26

@EbGreen said

I would just take the log files and strip the timestamps off the start of each line then save the file out to different files. Then diff those files.

That's probably the best bet, unless your diffing tool has special powers. For example, you could

cut -b13- file1 > trimmed_file1
cut -b13- file2 > trimmed_file2
diff trimmed_file1 trimmed_file2

See @toolkit's response for an optimization that makes this a one-liner and obviates the need for extra files. If your shell supports it. Bash 3.2.39 at least seems to...

edited Sep 06 '08 at 11:38

answered Sep 04 '08 at 15:38

Blair Conrad

233,004
25
132
111

If someone's timestamps also have a date in ISO format, then use -b25- – Brian B Oct 18 '13 at 15:36

oHo · Answer 3 · 2013-11-25T09:45:20.060

Answers using cut are fine but sometimes keeping timestamps within the diff output is appreciable. As the OP's question is about ignoring the time stamps (not removing them), I share here my tricky command line:

diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)

sed isolates the timestamps (# before and \n after) within a process substitution
diff -I '^#' ignores lines having these timestamps (lines beginning by #)

example

Two log files having same content but different timestamps:

$> for ((i=1;i<11;i++)) do echo "09:0${i::1}:00.000 data $i"; done > 1.log
$> for ((i=1;i<11;i++)) do echo "11:00:0${i::1}.000 data $i"; done > 2.log

Basic diff command line says all lines are different:

$> diff 1.log 2.log
1,10c1,10
< 09:01:00.000 data 1
< 09:02:00.000 data 2
< 09:03:00.000 data 3
< 09:04:00.000 data 4
< 09:05:00.000 data 5
< 09:06:00.000 data 6
< 09:07:00.000 data 7
< 09:08:00.000 data 8
< 09:09:00.000 data 9
< 09:01:00.000 data 10
---
> 11:00:01.000 data 1
> 11:00:02.000 data 2
> 11:00:03.000 data 3
> 11:00:04.000 data 4
> 11:00:05.000 data 5
> 11:00:06.000 data 6
> 11:00:07.000 data 7
> 11:00:08.000 data 8
> 11:00:09.000 data 9
> 11:00:01.000 data 10

Our tricky diff -I '^#' does not display any difference (timestamps ignored):

$> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
$>

Change 2.log (replace data by foo on the 6th line) and check again:

$> sed '6s/data/foo/' -i 2.log
$> diff -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
11,13c11,13
11,13c11,13
< #09:06:00.000
<  data 6
< #09:07:00.000
---
> #11:00:06.000
>  foo 6
> #11:00:07.000

=> timestamps are kept in the diffoutput!

You can also use the side by side feature using -y or --side-by-side option:

$> diff -y -I '^#' <(sed -r 's/^((.){12})/#\1\n/' 1.log) <(sed -r 's/^((.){12})/#\1\n/' 2.log)
#09:01:00.000                   #11:00:01.000
 data 1                          data 1
#09:02:00.000                   #11:00:02.000
 data 2                          data 2
#09:03:00.000                   #11:00:03.000
 data 3                          data 3
#09:04:00.000                   #11:00:04.000
 data 4                          data 4
#09:05:00.000                   #11:00:05.000
 data 5                          data 5
#09:06:00.000                 | #11:00:06.000
 data 6                       |  foo 6
#09:07:00.000                 | #11:00:07.000
 data 7                          data 7
#09:08:00.000                   #11:00:08.000
 data 8                          data 8
#09:09:00.000                   #11:00:09.000
 data 9                          data 9
#09:01:00.000                   #11:00:01.000
 data 10                         data 10

old `sed`

If your sed implementation does not support the -r option, you may have to count the twelve dots <(sed 's/^\(............\)/#\1\n/' 1.log) or use another pattern of your choice ;)

Thanks for your ingenious solution, this saved me hours of trying to do this in awk (and I would probably need to switch to perl to keep my sanity). — ack, May 05 '15 at 11:18
Yes! This is a great solution! This question comes up with some regularity, meld is one of the few tools that can do it, and it's *great* to have a command-line solution using only standard tools! — Matt Hellige, Sep 17 '19 at 23:33

score 16 · Answer 4 · edited Oct 13 '16 at 06:43

16

For a graphical option, Meld can do this using its text filters feature.

It allows for ignoring lines based on one or more python regex. The differences still appear, but lines that don't have any other differences won't be highlighted.

edited Oct 13 '16 at 06:43

bluenote10

23,414
14
122
178

answered Jun 02 '15 at 19:35

Dave Andersen

5,337
3
30
29

score 3 · Answer 5 · edited May 23 '17 at 11:46

3

Use Kdiff3 and at Configure>Diff edit "Line-Matching Preprocessor command" to something like:

sed "s/[ 012][0-9]:[0-5][0-9]:[0-5][0-9]//"

This will filter out time-stamps from comparison alignment algorithm.

Kdiff3 also lets you manually align specific lines.

edited May 23 '17 at 11:46

Community

1
1

answered Oct 19 '15 at 09:03

Pedro Reis

1,587
1
19
19

1

The files are still marked as different on every line but it allows me to search for real differences using the command **Go to Next / Previous Delta**. – Melebius Apr 24 '17 at 08:35
And to do it straight from the command line: `kdiff3 --cs LineMatchingPreProcessorCmd="sed \"s/[ 012][0-9]:[0-5][0-9]:[0-5][0-9]//\"" "/path/to/file 1.txt" "/path/to/file 2.txt"` – Colin Jun 04 '19 at 02:00

score 1 · Answer 6 · answered Aug 11 '22 at 11:19

1

I want to propose a solution for Visual Studio Code:

Install this extension - https://marketplace.visualstudio.com/items?itemName=ryu1kn.partial-diff
Configure it like this - https://github.com/ryu1kn/vscode-partial-diff/issues/49#issuecomment-608299085
Run extension command "Toggle Pre-Comparison Text Normalization Rules" and enable rule added on step #2
Use the extension (here is an explanation of it's UI quirk - https://github.com/ryu1kn/vscode-partial-diff/issues/11)

answered Aug 11 '22 at 11:19

vlad2135

1,556
12
14

1

Nice extension! Unfortunately, the displayed diff shows the normalized text (the result of the replacements performed by the normalization rules), rather than showing the original text (and merely ignoring parts of it for purposes of generating the difference highlighting). In contrast, Meld's text filters only apply for the sake of generating the diff. – sls Aug 24 '23 at 12:14

An easy way to diff log files, ignoring the time stamps?

6 Answers6

example

old `sed`

Linked

An easy way to diff log files, ignoring the time stamps?

6 Answers6

example

old sed

Linked

old `sed`