-1

I have an output from a program that I need to convert to columns. I know you can do this with awk or sed but I can't seem to figure out how.

This is how the output looks:

insert_job: aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa   job_type: c
box_name: sss-eee-ess-saturday
command: $${qqqq-eee-eat-cmd} $${qqqq-eee-nas-cntrl-dir}\eee\CMS\CMS_C3.xml $${qqqq-eee-nas-log}\eee\AFG\AFG_Build_Qwer.log buildProcess
machine: qqqq-eee-cntl
owner: system_uu_gggg_p@ad
permission: gx,wx
condition: s(qqqq-rtl-etl-40-datamart-load-cms) & s(qqqq-eee-ess)
std_out_file: >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.out
std_err_file: >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.err
max_run_alarm: 420
alarm_if_fail: 1
application: qqqq-M9887

I need it to look like this I need it to look like this

Or like this:

insert_job:                 job_type:   box_name:       command:        machine: 
aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa   c       sss-eee-ess-saturday    $${qqqq-eee-eat-cmd}    qqqq-eee-cntl

insert_job:;job_type:;box_name:;command:;machine:; 
aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa;c;sss-eee-ess-saturday;$${qqqq-eee-eat-cmd};qqqq-eee-cntl;

Basically either already with TAB separated or in CSV format. Thanks for the help

shellter
  • 36,525
  • 7
  • 83
  • 90
PFD
  • 17
  • 4
  • could you please confirm that job_type is on the same line of insert_job? – Lino Aug 10 '18 at 14:26
  • yes, it is. So my guess it has to be put on a separate line first? – PFD Aug 10 '18 at 14:37
  • having insert_job on a separate line might be convenient when writing the script. Perhaps it can be easier – Lino Aug 10 '18 at 14:39
  • 1
    Replace the image of expected output with text so we can test a solution against your provided input and diff it against the expected output to see if it works or not. Clearly state if there's only 1 record in your file (i.e. you'll only ever get 1 line of output) and if that isn't the case and there's multiple records in your input then, of course, show at least 2 of them so we can see what separates the records too. – Ed Morton Aug 10 '18 at 14:40
  • Can't you just use `rs -T` for this? It looks like a straightforward transposition. – Toby Speight Aug 13 '18 at 10:22

2 Answers2

1

You haven't shown us what the actual expected output looks like so I've assumed you want it tab-separated and unquoted and I've made some other assumptions about how your input records are separated, etc.:

$ cat tst.awk
BEGIN { OFS="\t" }
{
    if ( numTags == 0 ) {
        tag = $1
        val = $2
        sub(/:$/,"",tag)
        tags[++numTags] = tag
        tag2val[tag] = val
        sub(/[^:]+: +[^ ]+ +/,"")
    }
    tag = val = $0
    sub(/: .*/,"",tag)
    sub(/[^:]+: /,"",val)
    tags[++numTags] = tag
    tag2val[tag] = val
}
tag == "application" {
    if ( !cnt++ ) {
        for (tagNr=1; tagNr<=numTags; tagNr++ ) {
            tag = tags[tagNr]
            printf "%s%s", tag, (tagNr<numTags ? OFS : ORS)
        }
    }
    for (tagNr=1; tagNr<=numTags; tagNr++ ) {
        tag = tags[tagNr]
        val = tag2val[tag]
        printf "%s%s", val, (tagNr<numTags ? OFS : ORS)
    }
    numTags = 0
}

.

$ awk -f tst.awk file
insert_job      job_type        box_name        command machine owner   permission      condition       std_out_file    std_err_file    max_run_alarm   alarm_if_fail   application
aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa       c       sss-eee-ess-saturday    $${qqqq-eee-eat-cmd} $${qqqq-eee-nas-cntrl-dir}\eee\CMS\CMS_C3.xml $${qqqq-eee-nas-log}\eee\AFG\AFG_Build_Qwer.log buildProcess qqqq-eee-cntl   system_uu_gggg_p@ad    gx,wx    s(qqqq-rtl-etl-40-datamart-load-cms) & s(qqqq-eee-ess)  >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.out      >E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.err      420     1       qqqq-M9887

If you want something easier for Excel to handle, this will produce a CSV that Excel will be able to open just by double clicking on the output file name:

$ cat tst.awk
BEGIN { OFS="," }
{
    if ( numTags == 0 ) {
        tag = $1
        val = $2
        sub(/:$/,"",tag)
        tags[++numTags] = tag
        tag2val[tag] = val
        sub(/[^:]+: +[^ ]+ +/,"")
    }
    tag = val = $0
    sub(/: .*/,"",tag)
    sub(/[^:]+: /,"",val)
    tags[++numTags] = tag
    tag2val[tag] = val
}
tag == "application" {
    if ( !cnt++ ) {
        for (tagNr=1; tagNr<=numTags; tagNr++ ) {
            tag = tags[tagNr]
            printf "\"%s\"%s", tag, (tagNr<numTags ? OFS : ORS)
        }
    }
    for (tagNr=1; tagNr<=numTags; tagNr++ ) {
        tag = tags[tagNr]
        val = tag2val[tag]
        printf "\"%s\"%s", val, (tagNr<numTags ? OFS : ORS)
    }
    numTags = 0
}

.

$ awk -f tst.awk file
"insert_job","job_type","box_name","command","machine","owner","permission","condition","std_out_file","std_err_file","max_run_alarm","alarm_if_fail","application"
"aaa-bbb-ess-qqqqqqq-aaaaaa-aaaaaa","c","sss-eee-ess-saturday","$${qqqq-eee-eat-cmd} $${qqqq-eee-nas-cntrl-dir}\eee\CMS\CMS_C3.xml $${qqqq-eee-nas-log}\eee\AFG\AFG_Build_Qwer.log buildProcess","qqqq-eee-cntl","system_uu_gggg_p@ad","gx,wx","s(qqqq-rtl-etl-40-datamart-load-cms) & s(qqqq-eee-ess)",">E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.out",">E\:\gggg\logs\qqqq-eee-ess-saturday-cms-build.err","420","1","qqqq-M9887"
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • This looks like it's doing the job but because the "Command" line has spaces it's separating that line and puts it in other columns – PFD Aug 10 '18 at 15:14
  • No it isn't and no it doesn't. The script doesn't care about blanks within fields, a field is a field whether it contains blanks or not and all feilds are separated by tabs from the other fields. The only problem you'd have with the output would be if your fields already contain tabs and then youd just change OFS="\t" to whatever char you want to use as the output field separator. – Ed Morton Aug 10 '18 at 15:18
  • OK I see. Is it possible to separate it by TAB so that I can easily split it in Excel? Right now there are single spaces, that's why I thought there was an error when I tried to convert data to columns in Excel – PFD Aug 10 '18 at 15:22
  • As I've mentioned, the output **IS** separated by tabs. Sounds like you're struggling to use Excel with tab-separated data (which is tricky given the whole data import interface they inexplicably implemented). Check the Excel documentation for how to import tab-separated files. – Ed Morton Aug 10 '18 at 15:25
  • Even if I paste to a simple notepad I see there are 6 spaces between "insert_job" and "job_type" – PFD Aug 10 '18 at 15:27
  • I've no idea how copy/paste works on your computer/clipboard/notepad but apparently when you do that it's converting the tabs that the tool is outputing to blank chars. The output is simply tab-separated, that's all. You can see the `OFS="\t"` - no magic going on, that **IS** the Output Field Separator. I've added a version that'll produce a CSV like Excel will probably be happy with (unless you have data that Excel thinks is a date - then we'd have to add more code to deal with that). This is why it's so important when you post a question to include **exactly** the text output you want to get – Ed Morton Aug 10 '18 at 15:31
  • If you care to learn more about the issue of Excel converting fields that it thinks might be dates into actual dates (e.g. it could convert a number like "1.3" into a date like "Jan-3"), see https://stackoverflow.com/questions/165042/stop-excel-from-automatically-converting-certain-text-values-to-dates but I don't see any obvious cases where that'd happen with the sample data you've provided so far. – Ed Morton Aug 10 '18 at 15:35
  • Another issue I have is when I run the script I get this error: awk: syntax error near line 6 awk: illegal statement near line 6 – PFD Aug 10 '18 at 15:40
  • You are trying to run old, broken awk (/bin/awk on Solaris). Never use that awk. On Solaris use /usr/xpg6/bin/awk in stead (or .../xpg4/...). – Ed Morton Aug 10 '18 at 15:42
  • Hang on - if you're getting a syntax error then you haven't been able to run the script so what was all that preceding discussion about it producing incorrect output??? – Ed Morton Aug 10 '18 at 15:44
  • I copied the output you provided to test and it had 6 spaces. It seems to be working with xpg4, I will test the rest now and let you know – PFD Aug 10 '18 at 15:54
  • When formatted text is displayed on this web site, tabs in the raw text ARE replaced by blanks. Any time you're going to try to verify or discuss output from a tool - run the tool, don't assume some snapshot of it's output posted in some formatted way on some web site is exactly what the tool outputs. – Ed Morton Aug 10 '18 at 16:03
  • 2
    @EdMorton, You ***are*** a saint ;-) Good luck to all. – shellter Aug 10 '18 at 16:35
  • 1
    @EdMorton, thank you! It's working. One thing though, I noticed it says: tag == "application" {:, is it looking for that specific word? What if the last line is not "application"? – PFD Aug 10 '18 at 17:28
  • 1) Yes. 2) Then we'd have to use something else to identify the end or beginning of records, hence my comment under your question when I say `if ... there's multiple records in your input then, of course, show at least 2 of them so we can see what separates the records too.` – Ed Morton Aug 10 '18 at 20:16
  • From what I can see there are at least 2 different cases, one of them being "application" and another one called "alarm_if_fail". There might be others that I don't know about as well. Is there a way to make it stop without knowing the end word? – PFD Aug 10 '18 at 21:18
  • Yes there are several possibilities but I can't suggest any of them without seeing what your multi-record input actually looks like. Once again: `if ... there's multiple records in your input then, of course, show at least 2 of them so we can see what separates the records too.` – Ed Morton Aug 11 '18 at 05:45
0

If this is a one-off job, and you don't have to worry about double quotes in the source data, try something like this. I have assumed you want comma-separated values to put in a spreadsheet and the data is in a file called foo.txt.

echo $(sed 's/^\([^:]*\): \(.*\)$/"\1",/g' foo.txt)
echo $(sed 's/^\([^:]*\): \(.*\)$/"\2",/g' foo.txt)
Jon
  • 3,573
  • 2
  • 17
  • 24
  • That won't print the job_type column header and it'll treat the `job_type: c` text as if it were part of the insert_job data and it 'll add a trailing comma. It'll also misbehave given various input values like globbing chars. – Ed Morton Aug 10 '18 at 15:08