2

I have a csv file like this(date values just 7 days, but not static):

DATE,PATH,%
20.05.2015,RAM,0
20.05.2015,SWAP,00
20.05.2015, /, 000
20.05.2015, /dev, 0000
21.05.2015,RAM,1
21.05.2015,SWAP,11
21.05.2015, /, 111
21.05.2015, /dev, 1111
22.05.2015,RAM,2
22.05.2015,SWAP,22
22.05.2015, /, 222
22.05.2015, /dev, 2222
22.05.2015, /root, 22222
22.05.2015, /root2, 222222
23.05.2015, /root2, 333333
24.05.2015, /root2, 444444
25.05.2015, /root2, 555555
26.05.2015, /root2, 666666

I need to print date values of rows to columns like this:

PATH,20.05.2015,21.05.2015,22.05.2015,23.05.2015,24.05.2015,25.05.2015,26.05.2015
RAM,0,1,2,,,,
SWAP,00,11,22,,,,
/,000,111,222,,,,
/dev,0000,1111,2222,,,,
/root,,,22222,,,,
/root2,,,222222,333333,444444,555555,666666

Any suggestion for that with awk or other way?

I tried solutions on this topic:

An efficient way to transpose a file in Bash

But it's a different question, I can't fix for my question.

Excel view for easy understanding:

enter image description here

Community
  • 1
  • 1
onur
  • 5,647
  • 3
  • 23
  • 46
  • first: any suggestion on how to handle the question? you should provide some indication on what is the logic of this. There is no point in showing two files and ask a script to go from one to another one. Instead, indicate what occurs with the data, what it means and what you tried so far – fedorqui May 26 '15 at 14:02
  • Edited my question. Date values have just 7 different dates, but not static. But lines not static too, some days just 2 lines, some days more. Thats why I use ... because its not static. – onur May 26 '15 at 14:06
  • possible duplicate of [Transpose a file in bash](http://stackoverflow.com/questions/1729824/transpose-a-file-in-bash) – Kristján May 26 '15 at 14:17

2 Answers2

1
$ cat tst.awk
BEGIN { FS="[[:space:]]*,[[:space:]]*"; OFS="," }
NR==1 { cell[++numDates,++numPaths] = $2; next }
{ date = $1; path = $2 }
!(date in date2nr) { date2nr[date] = ++numDates; cell[numDates,1] = date }
!(path in path2nr) { path2nr[path] = ++numPaths; cell[1,numPaths] = path }
{ cell[date2nr[date],path2nr[path]] = $3 }
END {
    for (pathNr=1; pathNr<=numPaths; pathNr++) {
        for (dateNr=1; dateNr<=numDates; dateNr++) {
            printf "%s%s", cell[dateNr,pathNr], (dateNr<numDates?OFS:ORS)
        }
    }
}

$ awk -f tst.awk file
PATH,20.05.2015,21.05.2015,22.05.2015,23.05.2015,24.05.2015,25.05.2015,26.05.2015
RAM,0,1,2,,,,
SWAP,00,11,22,,,,
/,000,111,222,,,,
/dev,0000,1111,2222,,,,
/root,,,22222,,,,
/root2,,,222222,333333,444444,555555,666666
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Thank you, added excel example for easy understanding. I'm working on your example, I think I need to select distinct values of dates and print after PATH, – onur May 27 '15 at 06:34
  • 1
    The excel images helped clarify a lot, thanks for providing that. I've updated my answer. – Ed Morton May 27 '15 at 12:23
1

You desired output is not a pure transposition of the input, nor is it clear if there is a pure mapping of the number of fields to blank fields in the desired output. (I think that is true.)

If that is correct, this Python program does what you want:

import csv
from collections import OrderedDict

data=OrderedDict()

with open(fn) as f:
    reader=csv.reader(f, skipinitialspace=True)
    header=next(reader)
    data[header[1]]=[]
    for line in reader:
        if line[1] not in data:
            data[line[1]]=[]

with open(fn) as f:
    reader=csv.reader(f, skipinitialspace=True)
    header=next(reader)
    for line in reader:
        data[header[1]].append(line[0])
        data[line[1]].append(line[2])
        for e in set(data.keys())-set([header[1],line[1]]):
            data[e].append('')

for k, v in data.items():
    print k, ','.join(v)    

Prints:

PATH 20.05.2015,20.05.2015,20.05.2015,20.05.2015,21.05.2015,21.05.2015,21.05.2015,21.05.2015,22.05.2015,22.05.2015,22.05.2015,22.05.2015,22.05.2015,22.05.2015,23.05.2015,24.05.2015,25.05.2015,26.05.2015
RAM 0,,,,1,,,,2,,,,,,,,,
SWAP ,00,,,,11,,,,22,,,,,,,,
/ ,,000,,,,111,,,,222,,,,,,,
/dev ,,,0000,,,,1111,,,,2222,,,,,,
/root ,,,,,,,,,,,,22222,,,,,
/root2 ,,,,,,,,,,,,,222222,333333,444444,555555,666666
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Edited my question, sorry for misunderstanding. I don't want to multiple column for same date values. Added excel example. Thank you. – onur May 27 '15 at 06:25