1

I've tried various solutions to find a good way to get through a file beginning with a specific word, and ending with a specific word.

Let's say I have a file named states.txt containing:

Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
Florida
Georgia
Hawaii
Idaho
Illinois 
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana 
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania 
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming

I want to cat states.txt and get the following states that begin with Idaho and end with South Dakota.

I also want to ignore the fact that the states are in alphabetical order (the actual file contents I am going for are not in such order).

The result should look like:

Idaho
Illinois 
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana 
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania 
Rhode Island
South Carolina
South Dakota

Thank you for your time and patience on this one. I appreciate any help offered.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Andy D'Arata
  • 71
  • 1
  • 7

3 Answers3

8
awk '/Idaho/{f=1} f; /South Dakota/{f=0}' file

See Explain awk command for many more awk range idioms.

Don't get into the habit of using /start/,/end/ as it makes trivial things very slightly briefer but requires a complete rewrite or duplicate conditions for even the slightest requirements change (e.g. not printing the bounding lines).

For example given this input file:

$ cat file
a
b
c
d
e

to print the lines between b and d inclusive and then excluding either or both bounding lines:

$ awk '/b/{f=1} f; /d/{f=0}' file
b
c
d

$ awk 'f; /b/{f=1} /d/{f=0}' file
c
d

$ awk '/b/{f=1} /d/{f=0} f;' file
b
c

$ awk '/d/{f=0} f; /b/{f=1}' file
c

Try that if your starting point was awk '/b/,/d/' file and notice the additional language constructs and duplicate conditions required:

$ awk '/b/,/d/' file
b
c
d

$ awk '/b/,/d/{if (!/b/) print}' file
c
d

$ awk '/b/,/d/{if (!/d/) print}' file
b
c

$ awk '/b/,/d/{if (!(/b/||/d/)) print}' file
c

Also, it's not obvious at all but an insidious bug crept into the above. Note the additional "b" that's now between "c" and "d" in this new input file:

$ cat file
a
b
c
b
d
e

and try again to exclude the first bounding line from the output:

$ awk 'f; /b/{f=1} /d/{f=0}' file
c
b
d
-> SUCCESS

$ awk '/b/,/d/{if (!/b/) print}' file
c
d
-> FAIL

You ACTUALLY need to write something like this to keep using a range and exclude the first bounding line

$ awk '/b/,/d/{if (c++) print; if (/d/) c=0}' file
c
b
d

but by then it's obviously getting kinda silly and you'd rewrite it to just use a flag like my original suggestion.

Community
  • 1
  • 1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    The other 2 examples didn't work for the actual output I was looking for. The script you gave me was perfect, and I was able to modify it to suit what I needed. Thank you very much! – Andy D'Arata Apr 27 '15 at 15:15
7

Use sed with a pattern range:

sed '/^Idaho$/,/^South Dakota$/!d' filename

Or awk with the same pattern range:

awk '/^Idaho$/,/^South Dakota$/' filename

In both cases, the ^ and $ match the beginning and end of the line, respectively, so ^Virginia$ matches only if the whole line is Virginia (i.e., West Virginia is not matched).

Or, if you prefer fixed-string matching over regex matching (it doesn't make a difference here but might in other circumstances):

awk '$0 == "Idaho", $0 == "South Dakota"' filename
Wintermute
  • 42,983
  • 5
  • 77
  • 80
0
#all bash
__IFS=$IFS
IFS=' '
list=$(cat file.txt)
start="Idaho"
stop="South Dakota"
fst=${list#*$start}
snd=${fst%$stop*}
result="$start$snd$stop"
echo $result
IFS=$__IFS

See http://tldp.org/LDP/abs/html/string-manipulation.html

  • 2
    If you _really_ wanted to this in pure `bash` code - which is generally _not_ a good idea - here's a shorter alternative: `[[ $(< file.txt) =~ Idaho.+South\ Dakota ]] && echo "$BASH_REMATCH"`. – mklement0 Apr 21 '15 at 21:18