I'm struggling to consolidate several dataframes into a flattened format to use in a Keras machine learning algorithm:
My challenge surrounds the differing format of 2 ancillary tables that are ordered more like lists (they are associated to the main table and to one another via a shared key).
Since my desired output is a single line containing the shared data from the three data frames (and not, for instance, a right-joined table with matches rows)... I'm at a loss for how to do this. It's probable some magical combination of unstack or pivot...
Any thoughts or advice on how I might massage these together (or if I'm asking the wrong question and should simply use them as multiple inputs into the ML algorithm, please set me straight!)
Thanks!
edit: at the (very reasonable) request in the comments, here is some sample data to manipulate:
import pandas as pd
import numpy as np
priceDf = pd.DataFrame(np.arange(30).reshape((6,5))
.transpose(), columns=['Key', 'volume', 'high', 'low', 'price', 'price_in_5_minutes'])
tradeBookDf = pd.DataFrame(np.append(np.repeat(np.arange(4), 4),(np.random.randint(5, size=48)))
.reshape((4,16)).transpose(), columns=['Key','size', 'price', 'bid_ask_bit'])
recentTradesDf = pd.DataFrame(np.append(np.repeat(np.arange(4), 4),(np.random.randint(5, size=48)))
.reshape((4,16)).transpose(), columns=['Key','price', 'quantity', 'timestamp'])
The goal is to output a single row of every piece of data associated with the key. eg:
| Key | volume | high | low | price | price_in_5_minutes | trade1_price | trade1_quantity | trade1_timestamp | trade2.... | book1_size | book1_price | book1_bid_ask_bit | book2.... |