0

I currently have a log file and I have to filter the information to get the longest word ending in several characters using the "grep" command.

For example, I have to find the words ending in "abc".

And I have the following file:

XXXXXabc
YYabc
ZZZdef
XXabc

The correct output should be:

XXXXXabc

Until now I had tried with the following:

grep -E '\abc' log.txt | wc -L

But this returns the maximum length without showing the word. How can I make the word print on the screen?

Thank you!

FJ Garrido
  • 25
  • 6
  • See: [... | awk '{print length, $0}' | sort -nr | head -1 | cut -d " " -f 2-](https://stackoverflow.com/a/1655488/3776858) – Cyrus Jan 19 '19 at 10:25

4 Answers4

0
grep -E \abc log.txt | awk '{print length($1) " " $1}' | sort -n |  tail -1 | awk '{print $2}'

The idea: find the length, sort as number and then from last line print only the 2nd element.

Marcel Preda
  • 1,045
  • 4
  • 18
  • 1
    Would be better if your code reflects your description: put the `tail` before the `awk`, so `awk` only have to process one line instead of the whole file. – Poshi Jan 19 '19 at 12:28
0

With a single awk command you can get it:

awk 'BEGIN {global_max = 0} /abc$/ {cur_max=length($0); if (cur_max > global_max) {global_max=cur_max; word=$0}} END {print word}' log.txt

Use a variable global_max that will keep track of the longest seen word (initialize to zero in the BEGIN block).

Then, for every line that ends in "abc", get the length and compare it to the maximum global length. If this is greater, substitute the old values for the new ones.

Finally, print the found word.

Poshi
  • 5,332
  • 3
  • 15
  • 32
0

Using sort will run slower (complexity O(n log n)). You should only visit each element once like the following (complexity O(n):

maxSize=0; maxWord=""; while read -r LINE; do if [[ ${#LINE} -gt $maxSize ]]; then maxSize=${#LINE}; maxWord="${LINE}"; fi; done < input.txt; echo "$maxWord"
Robert Seaman
  • 2,432
  • 15
  • 18
0

Grep can't do that alone but with awk :

awk '/abc$/{m=length($0)>length(m)?$0:m}END{print m}' infile
ctac_
  • 2,413
  • 2
  • 7
  • 17