Bash delete everything before the first blank on every line

Question

I have file of logs that all start with the timestamp, followed by the log level and then the message and I want a script that gets rid of the timestamp.

That is, I want a script that for every line of a file would turn:

21:22:34.571 DEBUG - some message

into

DEBUG - some message

I haven't used bash much so any advice would be appreciated.

When do dates appear in the log file — never, or only before the first message posted after midnight, or every hour, or ...? It matters because if the date lines are not preceded by a time stamp (and they probably shouldn't be), then you need to leave those alone. Mechanisms using `cut` or things like `sed 's/^[^ ]* //'` become less appropriate — you need to be more stringent in what you're matching. I'll observe that leaving the date out is probably not a good idea, but that's your problem, not mine. — Jonathan Leffler, Sep 28 '16 at 04:30
I'm trying to compare two sets of logs to try to figure out what's different between them, pretty hard to do that when it says every line is different due to the timestamp. — annedroiid, Sep 28 '16 at 04:33

score 2 · Answer 1 · answered Sep 28 '16 at 04:19

2

You can try either sed or cut depending on the input data:

sed -e 's/^[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}.[0-9]\{3\}//' <data_file_name>


cut -c 13- <data_file_name>

answered Sep 28 '16 at 04:19

GMichael

2,726
1
20
30

Comment (wry amusement rather than anything more serious): Using `` in a shell script is not a good idea — it looks like bungled I/O redirection. Use a simple name: `data_file_name` or `log.file` or something appropriate, commenting "Assuming the data is in `log.file`" or something similar. Using `"$@"` can also be good; it indicates 'process all the file name arguments to the script, or standard input if there are none'. – Jonathan Leffler Sep 28 '16 at 04:24
1

Also, it might be simpler to use `sed 's/^[^ ]* //'` — delete all the non-blanks up to and including the first blank. Your script doesn't actually remove the blank, either. – Jonathan Leffler Sep 28 '16 at 04:25
@JonathanLeffler You're right in both cases. The question was about an idea. I gave two ideas. Nothing more – GMichael Sep 28 '16 at 04:33

score 2 · Accepted Answer · answered Sep 28 '16 at 04:30

2

If you could use awk:

awk '$1="";1' data_file_name

Else, use the shell (very very slow):

#!/bin/bash
while read -r line; do
    printf '%s\n' "${line#* }"
done <"data_file_name"

answered Sep 28 '16 at 04:30

1

Note that the `awk` script leaves leading blanks on the output lines. The shell code is clean in that respect. – Jonathan Leffler Sep 28 '16 at 04:40
@JonathanLeffler Worse, the `awk` code, by assigning to `$1`, causes `$0` to be reconstituted from all the parameters by interposing `OFS` between them. It clobbers the white-space separation between the fields. – Kaz Sep 28 '16 at 04:59
@Kaz And there is [this answer with 236 up votes that does exactly the same](http://stackoverflow.com/a/2961994/6843677). If you want to actually [keep the spaces use this (probably option 4)](http://stackoverflow.com/a/32774373/6843677), but that seems like too much for this simple need. – Sep 28 '16 at 05:25

score 1 · Answer 3 · answered Sep 28 '16 at 04:50

grep can be used as well by simply extracting everything after space

$ cat ip.txt 
21:22:34.571 DEBUG - some message
21:23:34.571 DEBUG - some other message

This will leave a leading blank

$ grep -o ' .*' ip.txt 
 DEBUG - some message
 DEBUG - some other message

This won't

$ grep -oP ' \K.*' ip.txt 
DEBUG - some message
DEBUG - some other message

Steven the Easily Amused · Answer 4 · 2023-07-29T21:06:27.063

Since you're natively using bash, you can use the power of BASH built in string manipulations as per this example:

for txt in "21:22:34.571 DEBUG - some message" \
    'another .555 message' \
    '33:44:55.666 two timestamps 00:12:34.567 !' \
    'A shorter timestamp 11:22'
do
   echo "'$txt' > '${txt##*\.[0-9][0-9][0-9] }'"
done

'21:22:34.571 DEBUG - some message' > 'DEBUG - some message'
'another .555 message' > 'message'
'33:44:55.666 two timestamps 00:12:34.567 !' > '!'
'A shorter timestamp 11:22' > 'A shorter timestamp 11:22'

Note how the example with timestamp at the end was truncated to just a "!" while another .555 was stripped from the second example. See the explanation for why.

Explanation and Alternatives

BASH has many built in string handling capabilities. That means, among other things, it's possible to do quite a lot with BASH without needing to use any external utilities or subshells.

Replace Leading Characters (# or ##) (also % and %%)

${txt##*\.[0-9][0-9][0-9] } The '#' or '##' operator tells BASH to remove any string that matches the regular expression that follows starting from the left. The difference is that the single "#" matches the shortest match while ## is greedy and matches the longest. Here the *\.[0-9][0-9][0-9] matches ANYTHING that is followed by a period (.), 3 decimals, and a space. That is true of another .555 message so the another .555 (leading portion) was stripped.

If you know the timestamps are only at the beginning and only of the given format, you can do this instead

${txt#*.[0-9][0-9][0-9] } Tells bash to only look for one match that STARTS at the beginning of the string and # instead of ## tells it to match the shortest string.

% and %% work the same way, however they match the END of the string rather than the beginning.

Substitute Matching Pattern (/ or //)

This is the MOST accurate for the examples given.

${txt//[0-9][0-9]:[0-9][0-9]:[0-9][0-9].[0-9][0-9][0-9] /}

While it's a little more tedious, The // means substitute all - a single / would only substitute the first match. By specifying the entire pattern which is: two digits, colon, two digits, colon, period, three digits and a space the // option will remove all things that match that timestamp format, and it will NOT match the .555. This is the result:

'21:22:34.571 DEBUG - some message' > 'DEBUG - some message'
'another .555 message' > 'another .555 message'
'33:44:55.666 two timestamps 00:12:34.567 !' > 'two timestamps !'
'A shorter timestamp 11:22' > 'A shorter timestamp 11:22'

References

BASH string manipulations do not provide full "RegEx" (Regular Expression) syntax. But they are often quick and easy to use in lieu of sed, awk, tr and other tools.

There are many more string operations possible than those described above. Here are some more references. I haven't found a clearly readable authoritative reference.

Bash delete everything before the first blank on every line

4 Answers4

Explanation and Alternatives

Replace Leading Characters (# or ##) (also % and %%)

Substitute Matching Pattern (/ or //)

References