0

I have an output similar to this

No Type  Pid    Status  Cause Start Rstr  Err Sem Time Program          Cl  User         Action                    Table
-------------------------------------------------------------------------------------------------------------------------------
 0 DIA    10897 Wait          yes   no     0   0    0                                    NO_ACTION                           
 1 DIA    10903 Wait          yes   no     0   0    0                                    NO_ACTION                           
 2 DIA    10909 Wait          yes   no     0   0    0                                    NO_ACTION                           
 3 DIA    10916 Wait          yes   no     0   0    0                                    NO_ACTION                           
 4 DIA    10917 Wait          yes   no     0   0    0                                    NO_ACTION                           
 5 DIA     9061 Wait          yes   no     1   0    0                                    NO_ACTION                     

But I want this table to be comma separated and fields with no values should print null instead of taking the output of next column! Currently I am receiving the below output.

NO=0,Type=DIA,Pid=10897,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=
NO=1,Type=DIA,Pid=10903,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=
NO=2,Type=DIA,Pid=10909,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=
NO=3,Type=DIA,Pid=10916,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=
NO=4,Type=DIA,Pid=10917,Status=Wait,Cause=yes,Start=no,Rstr=0,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=
NO=5,Type=DIA,Pid=9061,Status=Wait,Cause=yes,Start=no,Rstr=1,Err=0,Sem=0,Time=NO_ACTION,Program=,Cl=,User=,Action=,Table=

I have written a script to do the same but It's not including the columns with null values.

#!/bin/bash
sed 1,5d test.txt > temp.txt
input="temp.txt"
while IFS= read -r line
do
echo $line | awk 'BEGIN{FS=" ";OFS=","}{print "NO="$1,"Type="$2,"Pid="$3,"Status="$4,"Cause="$5,"Start="$6,"Rstr="$7,"Err="$8,"Sem="$9,"Time="$10,"Program="$11,"Cl="$12,"User="$13,"Action="$14,"Table="$15;}'
#echo "$line"
done < "$input"
Nitish
  • 968
  • 6
  • 9

2 Answers2

1

I am not experienced with awk which can obviously make the task much faster and shorter.
Although this can be done with a bash script as follows:

if [ "$#" -ne "2" ]
then
    echo "usage: <$0> input_file output_file"
    exit 1
fi

#input table file
input_file=$1
output_file=$2


#Get name for a temporary file by mktemp
temp_file=`mktemp headings_XXXXXX` 

#Store all headings separated by '\n' in a temporary file
sed -n '1p' $input_file | tr -s ' ' '\n' > $temp_file


headings=$(sed -n '1p' $input_file)

counter=0


#This loop would extract width of each column so that they can be given to cut as parameters
# like `cat filename | cut -b 3-8` would extract the entries in that column
while [ 1 ]
do
    upper_limit=${#headings}
    headings=${headings% [! ]*}
    lower_limit=${#headings}

    if [ "$upper_limit" = "$lower_limit" ]
    then
        limits_for_cut[$counter]=$(echo "1-${upper_limit}")
        counter=$( expr $counter + 1 )
        break
    fi

    lower_limit=$( expr $lower_limit + 1 )

    limits_for_cut[$counter]=$(echo "${lower_limit}-${upper_limit}")

    counter=$( expr $counter + 1 )

done


end_index=$( expr $counter - 1 )

no_of_lines=$( cat $input_file | wc -l )
no_of_lines=$( expr $no_of_lines - 2 ) #first 2 lines in file are for headings and dashes

on_line=$no_of_lines

#This loop will output all data to the specified file as comma separated
while [ $on_line -ne 0 ]
do
    counter=$end_index

    cat $temp_file |
        while read heading
        do
            tmp=$( expr $no_of_lines - $on_line + 1 + 2 )
            echo  "${heading}=`sed -n "${tmp}p" $input_file | cut -b ${limits_for_cut[$counter]} | sed 's/ //g'`," >> $output_file
            if [ $counter -eq 0 ]
            then
                break
            fi
            counter=$( expr $counter - 1 )
        done

    on_line=$( expr $on_line - 1 )
done    

echo `cat $output_file | tr -d '\n'` > $output_file

rm $temp_file

Basically, we are doing this with cut command.

Like for header “type" which lies between 3-8, we can simply do like this cut -b 3-8 filename.

I ran this on OSX. You may need to change cut and sed syntax to suit your machine.

If this solution suits you, you should try the same with awk because that would make it much faster and shorter.

Mihir Luthra
  • 6,059
  • 3
  • 14
  • 39
1

Based on awk you can easily achieve this, by counting length of fields (computed with the 2 firsts lines) and then retrieving substrings from the current line. Here is a proposal that does what you want, and that parses the header from the input (this works for one file at a time)

# FIELD array to store start/len for each field
# --- Functions from https://stackoverflow.com/a/27158086/5868851
function ltrim(s) { sub(/^[ \t\r\n]+/, "", s); return s }
function rtrim(s) { sub(/[ \t\r\n]+$/, "", s); return s }
function trim(s) { return rtrim(ltrim(s)); }
# --- Header parsing BEGIN
NR == 1 {
    for (i = 1; i < NF; ++i) {
        field_len = index($0,$(i+1)) - 1 - total
        FIELD[i, "start"] = total
        FIELD[i, "len"] = field_len
        FIELD[i, "name"] = $i
        total += field_len
    }
    last_field = $NF
}
NR == 2 {
    # Last field is of len length($0) - total
    FIELD[i, "start"] = total
    FIELD[i, "len"] = length($0) - total
    FIELD[i, "name"] = last_field
    FIELD_N = i
}
# --- Header parsing END
# --- Data parsing BEGIN
NR > 2 {
    sep=","
    for(i = 1; i <= FIELD_N; ++i) {
        value = trim(substr($0, FIELD[i, "start"], FIELD[i, "len"]))
        if (!value)
            value="null"
        if (i == FIELD_N)
            sep="\n"
        printf("%s=%s%s", FIELD[i, "name"], value, sep);
    }
}
# --- Data parsing END
Zelnes
  • 403
  • 3
  • 10