2

EDIT The number of lines and tab-separate values is also dynamic as it can change. So it might be 1-5 or 1-10 with the same layout but the region will only be listed once.


I have a file with in the following format:(@TSV)

host1   host2   host3
id1 id2 id3
ip1 ip2 ip3
name1   name2   name3
role1   role2   role3
region

I can also format the file like:

host1
host2
host3
id1
id2
id3
ip1
ip2
ip3
name1
name2
name3
role1
role2
role3
region

I would like to write a new file or modify this file inline so the file is in this format: (tsv)

host1   id1 ip1 name1 role1 region
host2   id2 ip2 name2 role2 region
host3   id3 ip3 name3 role3 region

I have tried without success to use awk, sed, for loops... I need some fresh ideas.

Larry Raab
  • 23
  • 4
  • why is region only once in your input file? it it was thrice, the final output your require is easy to obtain – Sundeep Oct 28 '16 at 15:28
  • It looks like you want to transpose. I think that would be quick and easy with something like Python/Pandas. Maybe try some of these ideas? http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash (I haven't tried them myself yet.) – Mark Miller Oct 28 '16 at 15:31
  • if it exact transpose required, this would be duplicate of https://stackoverflow.com/questions/40067992/reorder-columns-using-awk – Sundeep Oct 28 '16 at 15:34
  • I have updated the question with some important info. – Larry Raab Oct 28 '16 at 15:46

3 Answers3

1

The idiomatic awk approach to transposing rows to columns:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
    numCols = NR
    numRows = (NF>numRows ? NF : numRows)
    for (rowNr=1; rowNr<=NF; rowNr++) {
        vals[rowNr,numCols] = $rowNr
    }
}
END {
    for (rowNr=1; rowNr<=numRows; rowNr++) {
        for (colNr=1; colNr<=numCols; colNr++) {
            val = ((rowNr,colNr) in vals ? vals[rowNr,colNr] : vals[1,colNr])
            printf "%s%s", val, (colNr<numCols ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file
host1   id1     ip1     name1   role1   region
host2   id2     ip2     name2   role2   region
host3   id3     ip3     name3   role3   region

The above was run on your first input file:

$ cat file
host1   host2   host3
id1     id2     id3
ip1     ip2     ip3
name1   name2   name3
role1   role2   role3
region

Note the script makes no reference to any values in your input, nor how many rows or columns you have nor any other assumptions about the content of your input file except that if values are missing you want the first one repeated.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

You can use the following awk script:

# translate.awk

NR==1 {
    split($0,hosts)
}
NR==2 {
    split($0,ids)
}
NR==3{
    split($0,ips)
}
NR==4{
    split($0,names)
}
NR==5{
    split($0,roles)
}
NR==6{
    region=$1
}

END{
    OFS="\t"
    for(i in hosts) {
        print hosts[i], ids[i], ips[i], names[i], roles[i], region
    }
}

Call it like this:

awk -f translate.awk input.file
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • I really like this but what if the file changes dynamically each time it is run? Sometimes it might have 10 hosts or 3 hosts with the corresponding information. – Larry Raab Oct 28 '16 at 15:48
  • Do you see anything that restricts the script to 3 hosts? :) The script can deal with various amounts of hosts. Check the final for loop – hek2mgl Oct 28 '16 at 15:49
  • Ah... One final question... How could I put the inside a bash script where they won't have the separate .awk file? – Larry Raab Oct 28 '16 at 15:54
0

Starting with the list formatted version, if you had no missing data, i.e. "religion" 3 times, it would be much easier.

You can add the missing values on the fly and then simply pr

$ awk '1; END{print;print}' file | pr -6ts

host1   id1     ip1     name1   role1   region
host2   id2     ip2     name2   role2   region
host3   id3     ip3     name3   role3   region

if the number of columns are known and only the last values might be missing, you can parametrize by number of columns

$ cols=6; awk -v cols=$cols '1; END{for(i=1;i<=(NR-cols)/(cols-1);i++) print}' file |
  pr -${cols}ts
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • It will work for unspecified number of rows (hosts) as long as the missing values are added and there are 6 columns. – karakfa Oct 28 '16 at 15:50
  • How should it? You print the region exactly 3 times. – hek2mgl Oct 28 '16 at 15:51
  • OK. added a more generic version, but the constraint of only the last column can have missing values (which will be repeated) remains. – karakfa Oct 28 '16 at 15:56