Say I need to have data stored as follows:
[[[{}][{}]]]
or a list of lists of two lists of dictionaries
where:
{}
: dictionaries containing data from individual frames observing an event. (There are two observers/stations, hence two dictionaries.)
[{}][{}]
: two lists of all the individual frames related to a single event, one from each observer/station.
[[{}][{}]]
: list of all events on a single night of observation.
[[[{}][{}]]]
: list of all nights.
Hopefully that's clear. What I want to do is create two pandas dataframes where all dictionaries from station_1
are stored in one, and all dictionaries from station_2
are stored in the other.
My current method is as follows (where data
is the above data structure):
for night in range(len(data)):
station_1 = pd.DataFrame(data[night][0])
station_2 = pd.DataFrame(data[night][1])
all_station_1.append(station_1)
all_station_2.append(station_2)
all_station_1 = pd.concat(all_station_1)
all_station_2 = pd.concat(all_station_2)
My understanding though is that the for loop must be horribly inefficient since I will be scaling the application of this script way up from my sample dataset this cost could easily become unmanageable.
So, any advice for a smarter way of proceeding would be appreciated! I feel like pandas is so user friendly there's gotta be an efficient way of dealing with any kind of data structure but I haven't been able to find it on my own yet. Thanks!