134

Given a file, for example:

potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789

I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:

1234
5432

How can I do that?

Dario Seidl
  • 4,140
  • 1
  • 39
  • 55
Lexicon
  • 2,467
  • 7
  • 33
  • 41

8 Answers8

179
grep 'potato:' file.txt | sed 's/^.*: //'

grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).

or

grep 'potato:' file.txt | cut -d\   -f2

For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).

or

grep 'potato:' file.txt | awk '{print $2}'

For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.

or

grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'

All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.

or

awk '{if(/potato:/) print $2}' < file.txt

The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.

or

perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt

The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).

rid
  • 61,078
  • 31
  • 152
  • 193
88

Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt

mohit6up
  • 4,088
  • 3
  • 17
  • 12
  • 7
    I tried some one-liners from the accepted answer above, but I feel that this answer more accurately solves the question. – Jake88 Oct 23 '14 at 15:40
  • 5
    Some explanation: Option `-o` means print only the matching part of the line. Whereas `-P` infers a Perl-compatible regular expression, which happens to be a [positive lookbehind](http://www.regular-expressions.info/lookaround.html) regex `(?<=string)`. – Serge Stroobandt Sep 22 '16 at 11:08
  • 2
    **Note**: because of the `-P` option, this solution is only compatible with [GNU `grep`](https://www.gnu.org/software/grep/), it won't work with the kind of [POSIX `grep`](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html) you can find in environments such as macOS. – rid Jan 08 '22 at 19:19
  • @rid that is a good point. One can download `GNU grep` on macOS with `brew install grep` and use it as `ggrep`, i.e., with an extra `g` prefix. – mohit6up Apr 01 '23 at 23:03
24
grep -Po 'potato:\s\K.*' file

-P to use Perl regular expression

-o to output only the match

\s to match the space after potato:

\K to omit the match

.* to match rest of the string(s)

tuxutku
  • 350
  • 2
  • 6
  • 1
    Thanks for regex explanation. – elulcao May 05 '21 at 18:42
  • 3
    **Note**: because of the `-P` option, this solution is only compatible with [GNU `grep`](https://www.gnu.org/software/grep/), it won't work with the kind of [POSIX `grep`](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html) you can find in environments such as macOS. – rid Jan 08 '22 at 19:22
14
sed -n 's/^potato:[[:space:]]*//p' file.txt

One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.

thb
  • 13,796
  • 3
  • 40
  • 68
2

This will print everything after each match, on that same line only:

perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt

This will do the same, except it will also print all subsequent lines:

perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt

These command-line options are used:

  • -n loop around each line of the input file
  • -l removes newlines before processing, and adds them back in afterwards
  • -e execute the perl code
Chris Koknat
  • 3,305
  • 2
  • 29
  • 30
2

You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.

Try this (semicolons are there to allow you to put it all on one line):

$ while read line;
  do
    if [[ "${line%%:\ *}" == "potato" ]];
    then
      echo ${line##*:\ };
    fi;
  done< file.txt

## tells bash to delete the longest match of ": " in $line from the front.

$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789

or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.

$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi

The substring to split on is ":\ " because the space character must be escaped with the backslash.

You can find more like these at the linux documentation project.

mightypile
  • 7,589
  • 3
  • 37
  • 42
  • `while read` is extremely slow; using an external utility will actually be much faster as long as you choose one with buffered I/O (i.e. practically any of the ones mentioned in this answer, and many others). – tripleee Jun 19 '19 at 16:23
  • Also, you should use `read -r` unless you are very specifically requiring some rather pesky legacy behavior from before POSIX. – tripleee Jun 19 '19 at 16:23
1

Modern BASH has support for regular expressions:

while read -r line; do
  if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
    echo "${BASH_REMATCH[1]}"
  fi
done
ceving
  • 21,900
  • 13
  • 104
  • 178
  • You want to avoid this, though. [Bash `while read` loop extremely slow compared to `cat`, why?](https://stackoverflow.com/questions/13762625/bash-while-read-loop-extremely-slow-compared-to-cat-why) – tripleee Sep 27 '22 at 04:24
  • @tripleee How to replace "while read" with "cat"? It does not make much sense to compare those two. This means in the end: do not use Bash at all, because using CPU registers directly is much faster. – ceving Sep 27 '22 at 06:26
  • The gist of the linked question is to use Awk or `sed` or `grep` etc to loop over all the lines in the file when you can. `cat` alone is not an improvement at all (its purpose is to concatenate multiple files; if you use it on a single file, you are [doing it wrong);](https://stackoverflow.com/questions/11710552/useless-use-of-cat) it's just a benchmark for comparison. If you are looping over one line at a time anyway for other reasons, this is still a viable technique (like, say you are extracting the name of a file which you then process inside the loop). – tripleee Sep 27 '22 at 06:34
-3
grep potato file | grep -o "[0-9].*"
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103