16

I would like to remove everything after the 2nd occurrence of a particular pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?

My input would be

After-u-math-how-however

Output should be

After-u

Everything after the 2nd - should be stripped out. The regex should also match zero occurrences of the pattern, so zero or one occurrence should be ignored and from the 2nd occurrence everything should be removed.

So if the input is as follows

After

Output should be

After
Sildoreth
  • 1,883
  • 1
  • 25
  • 38
Jose
  • 1,333
  • 5
  • 20
  • 38

8 Answers8

21

Something like this would do it.

echo "After-u-math-how-however" | cut -f1,2 -d'-'

This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

slm
  • 15,396
  • 12
  • 109
  • 124
Evan Purkhiser
  • 683
  • 6
  • 16
  • Looks like the best! Any idea about how to get the same in sed or awk? – Jose May 16 '14 at 00:22
  • 2
    How to achieve the same in reverse? Just cut "however" in the string and print it. No matter how big is the string – Hussain K Jan 14 '21 at 14:41
  • @HussainK - using https://stackoverflow.com/questions/22727107/how-to-find-the-last-field-using-cut, you could do ... `| rev | cut -f1 -d'-' | rev` – Amanda Jul 02 '21 at 15:59
8

This might work for you (GNU sed):

sed 's/-[^-]*//2g' file
potong
  • 55,640
  • 6
  • 51
  • 83
2

You could use the following regex to select what you want:

^[^-]*-\?[^-]*

For example:

echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"

Results:

After-u
Steve
  • 51,466
  • 13
  • 89
  • 103
  • +1; note, however, that there appears to be a _bug_ in FreeBDS grep 2.5.1 (as of OS X 10.9.3, for instance), causing the `^` anchor to be ignored, resulting in potentially _multiple_ matches (and thus multiple output lines). Works fine with GNU `grep`. – mklement0 May 16 '14 at 05:11
2

@EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:

With GNU sed for -r

$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u

With GNU awk for gensub():

$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u

Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    +1 for the `sed` solution; using `-E` instead of `-r` would make the command work with both GNU (Linux) and BSD (OSX) `sed`. POSIX `sed`, which uses _basic_ regexes, _can_ emulate `+`, namely as `\{1,\}`: `sed 's/\([^-]\{1,\}-[^-]*\).*/\1/'` – mklement0 May 16 '14 at 16:23
  • @IsinAltinkaya the way to express a preference is to upvote the answer you prefer. I upvoted potong's answer, for example. – Ed Morton Sep 23 '21 at 11:49
1

This can be done in pure bash (which means no fork, no external process). Read into an array split on '-', then slice the array:

$ IFS=-
$ read -ra val <<< After-u-math-how-however
$ echo "${val[*]}"
After-u-math-how-however
$ echo "${val[*]:0:2}"
After-u
kojiro
  • 74,557
  • 19
  • 143
  • 201
  • 1
    Good solution. You should reset the IFS after though, no? – Evan Purkhiser May 16 '14 at 01:55
  • @EvanPurkhiser no, you should use scope to manage the value. Put the above code in a function with `local IFS` instead of trying to manually save and restore the original IFS. – kojiro May 16 '14 at 12:40
  • 2
    So the positive about this is that there's `no fork, no external process` (why do we care?) but the negatives are that you still need to write more code to manage the scope of the IFS change, plus if you want to do this on more than 1 line you need to manually write a loop to process every line (unlike sed and awk solutions), plus as written it will handle any backslashes in the input incorrectly, plus you need to think about whether there's a globbing impact, plus you need to think about whether the echo is going to behave as desired. Shell is an environment from which to call tools. – Ed Morton May 16 '14 at 13:16
  • 1
    @EdMorton All of these "negatives" start with "if". "If" you haven't clarified your requirements, then you will get a generalized answer that may be optimal in some cases and suboptimal in others. Shell is an environment from which to call tools, and often it's valuable to understand which of those tools are built into the shell, instead of always falling back on `awk` and `sed`. – kojiro May 19 '14 at 15:12
  • @EdMorton also, what globbing impact? 1. Bash doesn't expand globs in a herestring. 2. The shell doesn't expand globs within double-quoted parameter expansions, including array expansions. The only way to have a problem with globs in this answer would be to remove the quotes, which would substantially change the answer. – kojiro May 19 '14 at 15:21
1
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
  • Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
  • Always print the 1st field (print $1), followed by:
    • If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
    • Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
mklement0
  • 382,024
  • 64
  • 607
  • 775
0
awk '$0 = $2 ? $1 FS $2 : $1' FS=-

Result

After-u
After
Zombo
  • 1
  • 62
  • 391
  • 407
0

This will do it in awk:

echo "After" | awk -F "-" '{printf "%s",$1; for (i=2; i<=2; i++) printf"-%s",$i}'
John C
  • 4,276
  • 2
  • 17
  • 28
  • Ok, I had another crack. Despite my better judgement because the OP has done no research or made any attempt to solve. – John C May 16 '14 at 00:53