0

I have a csv file that contains data about clients on separate lines that I want to convert.

The current layout is:

Client_Name,Client_ID,Client_Group,Attribute_Name,Date,Attribute_Value

Each row contains info about a client e.g.:

Acme,0001,Marketing,Sales_Amt,2010-10-01,100
Acme,0001,Marketing,Queries,2010-10-01,3
Smiths,0002,Retail,Sales_Amt,2010-10-01,1200
Smiths,0002,Retail,Queries,2010-10-01,11

What I want to do is transform it to the following time series layout:

Date,Client_Name,Sales_Amt,Queries

So that each row reads:

2010-10-01,Acme,100,3
2010-10-01,Smiths,1200,11

So I can see per day, per client attributes. Is this a job for ETL or can I do this easily just with file manipulation tools like sed & awk?

  • 2
    Sounds trivial for awk. All you need to know is how to set field separators and how to print fields – oguz ismail Nov 06 '19 at 14:31
  • 1
    Are the related lines always adjacent? – choroba Nov 06 '19 at 14:35
  • AND will there be multiple days of data in the same file? Or do you have 1 file per day? Rather that reply in comments, please update your Q with this important informatin (and above). Good luck. – shellter Nov 06 '19 at 17:00
  • Possible duplicate of https://unix.stackexchange.com/questions/359697/print-columns-in-awk-by-header-name – Ravi Saroch Nov 08 '19 at 09:59

2 Answers2

1
$ cat tst.awk
BEGIN { FS=OFS="," }
{
    sub(/\r$/,"")
    curr = $5 OFS $1
}
curr != prev {
    if ( NR > 1 ) {
        print prev, vals[1], vals[2]
    }
    prev = curr
    cnt = 0
}
{ vals[++cnt] = $NF }
END {
    print prev, vals[1], vals[2]
}

$ awk -f tst.awk file
2010-10-01,Acme,100,3
2010-10-01,Smiths,1200,11

See What's the most robust way to efficiently parse CSV using awk? for more info on parsing CSVs with awk.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Perl and its Text::CSV_XS to the rescue!

perl -MText::CSV_XS=csv -we '
    csv(in    => shift,
        on_in => sub { $h{ $_[1][4] }{ $_[1][0] }[ $_[1][3] eq "Queries" ] = $_[1][5] },
        out   => \"skip");
    csv(in => [ map { $x = $_;
                      map [$x, $_, @{ $h{$x}{$_} } ],
                      sort keys %{ $h{$_} }
                } sort keys %h ]);
' -- file.csv

It first fills a hash %h with the data you want to keep, and then outputs it as a new csv.

choroba
  • 231,213
  • 25
  • 204
  • 289