1

Suppose that I have the following text file (it can have more states, cities, and colleges:

begin_state
New York
end_state

begin_cities
Albany
Buffalo
Syracuse
end_cities

begin_colleges
Cornell
Columbia
Stony Brook
end_colleges

begin_state
California
end_ state

begin_cities
San Francisco
Sacramento
Los Angeles
end cities

begin_colleges
Berkeley
Stanford
Caltech
end_colleges

I want to use awk to filter all the cities and list them under the states or select all the colleges and list them under the states: For example--if I want the cities, they should be output as follows.

**New York**
Albany
Buffalo
Syracuse
**California**
San Francisco
Sacramento
Los Angeles

Any suggestions are welcome.

shellter
  • 36,525
  • 7
  • 83
  • 90
Cyrus
  • 13
  • 4
  • Hi, if this is [homework](http://stackoverflow.com/tags/homework/info), it will help to tag it as such. What have you tried so far? – neillb May 10 '11 at 02:02
  • Note also that your input list has at least two errors: `end_ state` and `end cities`. – neillb May 10 '11 at 02:02
  • Thanks for pointing out the errors. This is not a homework problem. I have not coded it in awk. Tried it in excel without much success. I saw a single-line awk code at stackoverflow.com that handles single field filtering problems nicely. (See http://stackoverflow.com/questions/4705331/filter-text-which-appears-between-two-marks.) In my problem, there are two pairs of marks: One is always the "state". The second one can be the "city" or "college". So, I need to always print the state names and then print either the cities or colleges. Other fields can also be added under the states. – Cyrus May 10 '11 at 02:55

1 Answers1

1

Here are two solutions in awk. The first is naive and repetitive but easier to follow and learn from. The later one is an attempt at reducing the repetition.

Both solutions are fragile with respect to handling errors in your data file. If you are free to choose the implementation language I suggest you do this in something like ruby, perl or python.

Save to a file (e.g. showinfo.sh) and invoke with a single argument: "cities" or "colleges", to determine the mode. Also you must redirect the data file into stdin.

Example invocation (for either solution):

./showinfo.sh cities < states.txt
./showinfo.sh colleges < states.txt

The naive solution:

#!/bin/bash
set -e
set -u
#mode=cities
mode=$1

awk -v mode=$mode '
/begin_state/    {st="states"; next} 
/end_state/      {next} 
/begin_cities/   {st="cities"; next} 
/end_cities/     {next} 
/begin_colleges/ {st="coll"; next} 
/end_colleges/   {next} 

{ 
  if (st=="states") {
    sn=$0; 
  }
  else 
    if (st=="cities") cities[sn]=cities[sn]"\n"$0
    else if (st=="coll") colleges[sn]=colleges[sn]"\n"$0; 
} 

END {
  if (mode=="cities") {
    for (sn in cities) { print "=="sn"=="cities[sn] } ; 
  } 
  else if (mode=="colleges") {
    for (sn in colleges) { print "=="sn"=="colleges[sn] } ; 
  } 
  else { print "set mode either cities or colleges" }
}'

Second solution, with repetition removed:

#!/bin/bash
set -e
set -u
mode=$1
awk -v mode=$mode '
/begin_/    {st=$1; next} 
/end_/      {st=""; next} 

{ 
  if (st=="begin_state") { sn=$0 }
  else { data[st, sn]=data[st, sn]"\n"$0 }
} 

END {
  for (combo in data) {
    split(combo, sep, SUBSEP);
    type = sep[1];
    state_name = sep[2];
    if (type == "begin_"mode) {
      print "==" state_name "==" data[combo];
    }
  }
}'

Input file used (as I note it has changed recently in the question):

begin_state
New York
end_state
begin_cities
Albany
Buffalo
Syracuse
end_cities
begin_colleges
Cornell
Columbia
Stony Brook
end_colleges
begin_state
California
end_state
begin_cities
San Francisco
Sacramento
Los Angeles
end_cities
begin_colleges
Berkeley
Stanford
Caltech
end_colleges

Session when running the first solution:

$ bash showinfo.sh cities < states.txt 
==New York==
Albany
Buffalo
Syracuse
==California==
San Francisco
Sacramento
Los Angeles
neillb
  • 4,893
  • 1
  • 22
  • 18
  • If you tried your code with the example I described, could you also please post a copy of your unix session? Thanks so much! -C. – Cyrus May 11 '11 at 04:31
  • Thanks, it indeed has helped to clarify things for me. I very much appreciate your taking the time to put out this nice code. I can now try it out on my text file. Hopefully, it can help others to deal with similar filtering problems as well. Tx. again. -C – Cyrus May 12 '11 at 02:18
  • @Cyrus when the solution is exactly what you need, don't forget to mark @neillb's answer as the 'right' answer. – pepoluan May 12 '11 at 05:22