I have a table that contains the columns Date, Identifier, and Price e.g.
| Identifier | Date | Price |
|------------|----------|-------|
| 693477AA|1990/10/31| 100|
| 353477ZB|1991/08/31| 101|
| 123457ZB|1992/08/31| 105|
I'm using pandas read_sql function to grab the data from a SQL Server database. Either using SQL or pandas DataFrame functionality I need to get the data into the following pandas DataFrame format.
693477AA 353477ZB 123457ZB
Date
1988-1-1 NaN NaN 99.41
1988-1-2 100.54 NaN 98.11
1988-1-3 99.45 NaN NaN
So that there is a (possibly Null) price entry for every DISTINCT Date in the table. For the set of Identifiers satisfying a condition.
Right now I have it working with a for loop,
data = []
identifiers = "SELECT DISTINCT Identifier FROM TABLE WHERE [Condition]"
for id in identifiers:
data.append("SELECT Date, Price FROM TABLE WHERE Identifier=[id] ORDER BY DATE")
pandas.concat(data, axis=1)
However this only works for very strict [Condition] since the table is quite large (>3M rows).
How can I implement SQL, DataFrame manipulation, or a combination of both to achieve the desired format?
Thanks.