2

Say my stream is x*N lines long, where x is the number of records and N is the number of columns per record, and is output column-wise. For example, x=2, N=3:

1
2
Alice
Bob
London
New York

How can I join every line, modulo the number of records, back into columns:

1   Alice   London
2   Bob     New York

If I use paste, with N -s, I get the transposed output. I could use split, with the -l option equal to N, then recombine the pieces afterwards with paste, but I'd like to do it within the stream without spitting out temporary files all over the place.

Is there an "easy" solution (i.e., rather than invoking something like awk)? I'm thinking there may be some magic join solution, but I can't see it...


EDIT Another example, when x=5 and N=3:

1
2
3
4
5
a
b
c
d
e
alpha
beta
gamma
delta
epsilon

Expected output:

1   a   alpha
2   b   beta
3   c   gamma
4   d   delta
5   e   epsilon
Xophmeister
  • 8,884
  • 4
  • 44
  • 87
  • Could you please add output if values are other than x=2 and N=3 too into your posts? – RavinderSingh13 Aug 15 '17 at 12:06
  • Done for x=5 and N=3 – Xophmeister Aug 15 '17 at 12:15
  • Do you know x and/or N beforehand? – glenn jackman Aug 15 '17 at 12:16
  • @Xophmeister Do you know X in advance? – hek2mgl Aug 15 '17 at 12:16
  • You know N, but not x (although I guess you can deduce it) – Xophmeister Aug 15 '17 at 12:17
  • Like I say, paste with N dashes produces the transposed output. That's not what I want – Xophmeister Aug 15 '17 at 12:42
  • You can transpose your `paste` output using [this SO solution](https://stackoverflow.com/a/1729980/1072112). No shell tools will natively rotate your table in one go; awk provides the framework to do what you need if you have the memory to hold the table data, but temporary files are part of the shell experience. If you have a solution that requires them, embrace it and use the pain to prompt you to learn other languages better suited to this problem. – ghoti Aug 15 '17 at 12:45
  • Might as well just use awk for the correct output, than producing the transposed output and then running it through awk – Xophmeister Aug 15 '17 at 12:46

3 Answers3

2

You are looking for pr to "columnate" the stream:

pr -T -s$'\t' -3 <<'END_STREAM'
1
2
Alice
Bob
London
New York
END_STREAM
1       Alice   London
2       Bob     New York

pr is in coreutils.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

Most systems should include a tool called pr, intended to print files. It's part of POSIX.1 so it's almost certainly on any system you'll use.

$ pr -3 -t < inp1
1                       a                       alpha
2                       b                       beta
3                       c                       gamma
4                       d                       delta
5                       e                       epsilon

Or if you prefer,

$ pr -3 -t -s, < inp1
1,a,alpha
2,b,beta
3,c,gamma
4,d,delta
5,e,epsilon

or

$ pr -3 -t -w 20 < inp1
1      a      alpha
2      b      beta
3      c      gamma
4      d      delta
5      e      epsilo

Check the link above for standard usage information, or man pr for specific options in your operating system.

ghoti
  • 45,319
  • 8
  • 65
  • 104
0

In order to reliably process the input you need to either know the number of columns in the output file or the number of lines in the output file. If you just know the number of columns, you'd need to read the input file twice.

Hackish coreutils solution

# If you don't know the number of output lines but the
# number of output columns in advance you can calculate it
# using wc -l 

# Split the file by the number of output lines
split -l"${olines}" file FOO # FOO is a prefix. Choose a better one
paste FOO*

AWK solutions

If you know the number of output columns in advance you can use this awk script:

convert.awk:

BEGIN {
    # Split the file into one big record where fields are separated
    # by newlines
    RS=''
    FS='\n' 
}
FNR==NR {
    # We are reading the file twice (see invocation below)
    # When reading it the first time we store the number
    # of fields (lines) in the variable n because we need it
    # when processing the file.
    n=NF
}
{
    # n / c is the number of output lines
    # For every output line ...
    for(i=0;i<n/c;i++) {
        # ... print the columns belonging to it
        for(ii=1+i;ii<=NF;ii+=n/c) {
            printf "%s ", $ii
        }
        print "" # Adds a newline
    }
}

and call it like this:

awk -vc=3 -f convert.awk file file # Twice the same file

If you know the number of ouput lines in advance you can use the following awk script:

convert.awk:

BEGIN {
    # Split the file into one big record where fields are separated
    # by newlines
    RS=''
    FS='\n' 
}
{
    # x is the number of output lines and has been passed to the 
    # script. For each line in output
    for(i=0;i<x;i++){
        # ... print the columns belonging to it
        for(ii=i+1;ii<=NF;ii+=x){
            printf "%s ",$ii
        }   
        print "" # Adds a newline
    }   
}

And call it like this:

awk -vx=2 -f convert.awk file
hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • 2
    FYI, your "hackish" solution at the top is actually using POSIX.1 and POSIX.2 tools, and does not depend on GNU coreutils. – ghoti Aug 15 '17 at 14:22