-1

Newbie UNIX user question ...

The input file (location.txt) is this:

WGS_LAT deg 12
WGS_LAT min 30
WGS_LAT sec 05
WGS_LAT hsec    29
WGS_LAT northSouth  North
WGS_DLAT    decimalDegreesLatitude  12.501469
WGS_LONG    deg 07
WGS_LONG    min 00
WGS_LONG    sec 05
WGS_LONG    hsec    61
WGS_LONG    eastWest    West
WGS_DLONG   decimalDegreesLongitude -70.015606

I want to get all lines that start with WGS_LAT or WGS_DLAT.

First, is grep the tool you recommend for this job?

Second, if it is, then how to express the pattern? All of these failed:

grep ^WGS_LAT|^WGS_DLAT location.txt
grep ^(WGS_LAT|WGS_DLAT) location.txt
grep ^WGS_D?LAT location.txt

What is the correct pattern, please?

Roger Costello
  • 3,007
  • 1
  • 22
  • 43

2 Answers2

2

Grep can handle two types of regular expressions:

  • Basic regular expressions (BRE) which you call using grep PATTERN file
  • Extended regular expressions (ERE) which you call using grep -E PATTERN file

So by default grep makes use of BRE.

When reading the man-pages of grep you find

Basic vs Extended Regular Expressions In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

So, in your case the answer is:

$ grep "^\(WGS_LAT\|WGS_DLAT \)" location.txt
$ grep -E "^(WGS_LAT|WGS_DLAT)" location.txt
$ grep "^WGS_D\?LAT" location.txt
$ grep -E "^WGS_D?LAT" location.txt
kvantour
  • 25,269
  • 4
  • 47
  • 72
  • Some grep implementations, including GNU `grep`, the default on Linux systems, can also handle PCREs with `-P`. – terdon Oct 09 '22 at 11:11
1

First, you should always quote your regular expressions to protect them from the shell. For example, | has special meaning in the shell, it is the pipe operator that allows you to pass the output of one program as input to another. So the unquoted grep ^WGS_LAT|^WGS_DLAT location.txt is interpreted as "run grep ^WGS_LAT and pass its output as input to ^WGS_DLAT location.txt.

Next, grep uses Basic Regular Expressions by default, and to get the | to mean OR you need to either escape it as \| or use the -E (or -P flag if you are using GNU grep, which enables PCRE) to enable extended regular expressions. So all of these should work for you:

grep -E '^WGS_LAT|^WGS_DLAT' location.txt
grep -E '^(WGS_LAT|WGS_DLAT)' location.txt
grep '^WGS_LAT\|^WGS_DLAT' location.txt

Or, more simply, grep for lines starting with WGS_ and an optional D followed by LAT:

grep -E '^WGS_D?LAT' location.txt
terdon
  • 3,260
  • 5
  • 33
  • 57