0

Suppose we have source tabular data in this format

Service,Time,Metric

where time column has duplicate timestamp entries for different service metrics collected at same time. As such over time, there's repeating values under service, time, host, and metric (if it doesn't change much).

How might one transform or transpose the table into this format, normalizing against time. I know time might not match up across all service metrics, but let's naively assume it does.

Time,ServiceAMetric,ServiceBMetric,ServiceCMetric,etc.

where the header field will list the service names, the rest of the rows contain the metric values

Is this something that can be done with SQL? Or one has to transform with scripting? If the latter, how might I do so? Most any scripting language will do for me.

In this particular case, the data is from CSV file and not tables in DB. For transformation, can keep output as a new CSV file.

David
  • 3,223
  • 3
  • 29
  • 41
  • See http://stackoverflow.com/questions/7674786/mysql-pivot-table – Serg Sep 08 '16 at 19:02
  • Which DB are you using ? You're looking to do a Dynamic Pivot. – kdvy Sep 08 '16 at 19:03
  • Serg's link explains how to pivot in any SQL flavour. Your DB of choice may have extension that make it easier to pivot (e.g. SQL Server has the `PIVOT`, others may have similar constructs). All this works if service names (the columns) are known at design time. If not you will need dynamic SQL, i.e. write a query that constructs the pivoting query as string and then executes it. As for the time grouping you could round time to a certain resolution (e.g 5 minute intervals) or use a derived table query to do binning by dynamicaly calculating a set of intervals and match time against those – P. Kouvarakis Sep 08 '16 at 19:45
  • Thanks, I'll take a look at the suggestions. I have access to postgresql for a DB. Although I prefer to deal with CSV files. I guess can import into DB, pivot, and then export back out into CSV. – David Sep 09 '16 at 03:32
  • Since data is in CSV and not in DB, you may want to take a look at R (the language). Depending on the size of data and whether this is a one-off or regular operation, a scripting language could also work just as well. – P. Kouvarakis Sep 09 '16 at 08:02

0 Answers0