0

second time posting here. I apologize if I make any mistakes in my formatting. I have a file that contains a US State and its respective capital city next to it separated by a comma.

Alabama,Montgomery
Alaska,Juneau
Arizona,Phoenix
Arkansas,Little Rock
California,Sacramento
Colorado,Denver

I am trying to separate the state and city into two separate files and have managed to come up with this,

for line in $(cat file);do
    capital=$(echo $line | cut -d , -f2)
    state=$(echo $line | cut -d , -f1)
    echo $capital >> capitals
    echo $state >> states
done

The problem with this code is that even though I've set the cut delimiter to a comma, the program seems to have space still as a delimiter for cities that contains a space (ex. Little Rock).

With the program I have above, my capitals file contains,

Montgomery
Juneau
Phoenix
Little
Rock
Sacramento
Denver

Notice how Little Rock is in two separate lines and not in the same line. How can I modify my program to have it in the same line? I've tried setting IFS to a comma, but when I do, my capitals file also contains the states.

Alabama
Montgomery
Alaska
Juneau
Arizona
Phoenix
Arkansas
Little Rock
California
Sacramento
Colorado
Denver
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    The problem is that `for line in $(cat file)` splits on whitespace, including spaces as well as newlines. See the BashFAQ entry ["Why you don't read lines with `for"`](https://mywiki.wooledge.org/DontReadLinesWithFor). Use `while IFS=, read -r capital state; do` with input from the file. – Gordon Davisson Dec 05 '19 at 06:59

4 Answers4

2

Could you please try following, if you are ok with awk.

awk '
BEGIN{
  FS=","
  out_city="city_output_file"
  out_state="state_output_file"
}
{
  print $1 > (out_state)
  print $2 > (out_city)
}
'  Input_file

With bash:

while IFS=, read -r  state city;
do
   echo "$state" >> "state_output_file"
   echo "$city" >> "city_output_file"
done < "Input_file"
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Thank you for your reply, I tried running the program but it says "awk: cannot open Input_file (No such file or directory)" – KillahBeans Dec 05 '19 at 06:54
  • @KillahBeans, it will give, since you need to put your actual file name in place of Input_file, put it and it should fly then. – RavinderSingh13 Dec 05 '19 at 06:56
  • Just tried out the bash version and it works. However, I noticed that Colorado,Denver was skipped during the while loop as it is missing in the two output files. – KillahBeans Dec 05 '19 at 07:10
  • @KillahBeans, both solutions worked fine for me, how about `awk` command? – RavinderSingh13 Dec 05 '19 at 07:14
  • 1
    Ah nevermind. I managed to fixed it using the solution from https://stackoverflow.com/questions/12916352/shell-script-read-missing-last-line. I guess it's because the last line did not end with a new line. Thank you so much :D – KillahBeans Dec 05 '19 at 07:17
2

While awk is fine for this problem, you really should also understand how to read the file in a shell script and use parameter expansions to trim the unwanted text from each line to isolate the state and capital and write each to their respective files.

It is a basic bread and butter part of shell scripting. (and quite easy here) For example:

#!/bin/bash

states=${2:-states}         ## states as 2nd argument (default "states")
capitals=${3:-capitals}     ## capitals as 3rd argument (default "capitals")

:>$states       ## truncate both files
:>$capitals

while read -r line || [ -n "$line" ]; do
    echo "${line%,*}" >> "$states"     ## trim line from right to 1st comma
    echo "${line#*,}" >> "$capitals"   ## trim line from left to 1st comma
done < "$1"

(note: the script reads from the filename provided as the first argument to the program and writes to the state and capital files optionally provided as the 2nd and 3rd arguments)

Example Input File

$ cat file
Alabama,Montgomery
Alaska,Juneau
Arizona,Phoenix
Arkansas,Little Rock
California,Sacramento
Colorado,Denver

Example Use

$ bash separate.sh file

Resulting Output Files

States:

$ cat states
Alabama
Alaska
Arizona
Arkansas
California
Colorado

Capitals:

$ cat capitals
Montgomery
Juneau
Phoenix
Little Rock
Sacramento
Denver

awk will be faster, but the script above will be orders of magnitude more efficient than your original attempt that spawns multiple subshells per-iteration piping output to cut. Look things over and let me know if you have further questions.

Adding The Combined File

I guess you would also want a combined file for both state and capital on separate lines. Simply add another file for the output, e.g.

#!/bin/bash

states=${2:-states}         ## states as 2nd argument (default "states")
capitals=${3:-capitals}     ## capitals as 3rd argument (default "capitals")
combined=${4:-combined}     ## combined as 4th argument (default "combined")

:>$states       ## truncate all files
:>$capitals
:>$combined

while read -r line || [ -n "$line" ]; do
    echo "${line%,*}" >> "$states"     ## trim line from right to 1st comma
    echo "${line#*,}" >> "$capitals"   ## trim line from left to 1st comma
    printf "%s\n%s\n" "${line%,*}" "${line#*,}" >> "$combined"
done < "$1"

(note: adding || [ -n "$line" ] to your while loop condition will handle the last line without a POSIX end-of-file ('\n' at end of last line))

Resulting Output Files

Combined:

$ cat combined
Alabama
Montgomery
Alaska
Juneau
Arizona
Phoenix
Arkansas
Little Rock
California
Sacramento
Colorado
Denver
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
0

No need to create six child processes for each single line of the input. If the input file gets really large, this will cost a lot of wall clock time. I would do

cut -d , -f2 file > capitals
cut -d , -f1 file > states
user1934428
  • 19,864
  • 7
  • 42
  • 87
  • Whoa I never even though of that hahaha. I guess I was just over complicating things. – KillahBeans Dec 05 '19 at 07:33
  • 1
    Of course for really huge input files (or files being read over a slow network), the answer given by RavinderSingh13 is better than mine, since he needs only one process, while mine needs two. – user1934428 Dec 05 '19 at 07:36
  • @user1934428, This will also create 2 processes, but good solution, cheers :) – RavinderSingh13 Dec 05 '19 at 07:36
0

No need to create six child processes for each single line of the input. If the input file large then its useful

awk -F ',' '{print $(NF-1)}' > capital file
awk -F ',' '{print $NF}' > states file
Shubh Patel
  • 87
  • 1
  • 6