1

I have a file that begins with this kind of format

INFO|NOT-CLONED|/folder/another-folder/another-folder|last-folder-name|

What I need is to read the file and get this output:

INFO|NOT-CLONED|last-folder-name

I have this so far:

cat clone_them.log | grep 'INFO|NOT-CLONED' | sed -E 's/INFO\|NOT-CLONED\|(.*)/g'

But is not working as intended

NOTE: the last "another-folder" and "last-folder-name is the same

  • 1
    In general, there's no need to do `grep pattern | sed ...`. You can use `sed` to do the filtering. In this case: `sed -n '/INFO|NOT-CLONED/s/...//p` . Note that I've replaced your substitution with `...` because `sed` is the wrong tool for this. I'm just pointing out that `grep | sed` is an anti-pattern. – William Pursell Jun 25 '19 at 17:28
  • it's possible to use `cut` for this as well, `cat clone_them.log | cut -d'|' -f3` –  Jun 25 '19 at 17:47
  • Possible duplicate of [Non greedy (reluctant) regex matching in sed?](https://stackoverflow.com/questions/1103149/non-greedy-reluctant-regex-matching-in-sed) –  Jun 25 '19 at 17:48

2 Answers2

1

Its simpler in awk as input file is properly delimited by | symbol. You need to tell awk that the input fields are separated by | and output should also remain separated with | symbol using IFS and OFS respectively.

awk 'BEGIN{FS=OFS="|"}/INFO\|NOT-CLONED/{print $1,$2,$(NF-1)}' clone_them.log
INFO|NOT-CLONED|last-folder-name
P....
  • 17,421
  • 2
  • 32
  • 52
1

If you want a sed solution:

$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p' file
INFO|NOT-CLONED|last-folder-name

How it works:

  • -E

    Use extended regex

  • -n

    Don't print unless we explicitly tell it to.

  • s/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p

    Look for lines that include INFO|NOT-CLONED| (save this in group 1) followed by anything, .*, followed by | followed by any characters not |, [^|]* (saved in group 2), followed by | at the end of the line. The replacement text is group 1 followed by group 2.

    The p option tells sed to print the line if the match succeeds. Since the substitution only succeeds for lines that contain INFO|NOT-CLONED|, this eliminates the need for an extra grep process.

Variation: Returning just the last-folder-name

To just get the last-folder-name without the INFO|NOT-CLONED, we need only remove \1 from the output:

$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\2/p' file
last-folder-name

Since we no longer need the first capture group, we could simplify and remove the now unneeded parens so that the only capture group is the last folder name:

$ sed -En 's/INFO\|NOT-CLONED\|.*\|([^|]*)\|$/\1/p' file
last-folder-name
John1024
  • 109,961
  • 14
  • 137
  • 171
  • Thank you this is the answer I was looking for, I just have a doubt, if I need to just get the last-folder-name without the INFO|NOT-CLONED I just need to delete that part? – Mateo Gutierrez Jun 25 '19 at 18:28
  • @MateoGutierrez I'm glad this worked for you. I just updated the answer with code to get just the `last-folder-name` without the `INFO|NOT-CLONED`. – John1024 Jun 25 '19 at 20:43