Combine time series data for boxplot from multiple csv files

Question

I have multiple time series data in csv files from Netlogo model runs. I would like to join those series into one dataframe so that I can do a boxplot to see variations from different simulation model runs. X values in each csv are the time iterations (integers). The y values are the values of a particular measure in the model, e.g., population count. So, I can join the csvs with concat. There are repeated column names for the y variables. My thought is to combine columns with the same name into one column as a list of numbers (y values). Then I can pass that x, y to boxplot to plot that variable across time with its variations - median, etc. Data is of the form:


        x   population   groups  color
     0  0   0.00          0.00   0.00
     1  1  74.47         42.48  40.96
     2  2  74.46         42.48  40.96
    would become
        x  population      groups      color
     0  0  [0.00, 1.2]    [0.00, 5]    [0.00, 4]
     1  1  [74.47, 3.2]   [42.48, 55]  [40.96, 55]
     2  2  [74.46, Nan]   [42.48, NaN] [40.96, NaN]

There are multiples of this dataframe from different csv files (thousands). The x axis value can have a different maximum time value for different runs / csvs.

How do I combine dataframes such that I get one dataframe with a list of y values for a given y (column) for each x value. There will be NaNs for some y values for runs that ended early. Note that are multiple y columns. Note that each column is a separate boxplot (overlayed on the same plot).

I have tried concat, join, merge, and not been able to convert multiple columns with the same or different names into one column with a list of values rather than a single value.

Or, is there even a better way to do what I want to do with the data?

user2253054 · Answer 1 · 2020-05-18T19:12:59.150

The answer ended up being simpler than I expected. Insight into how to do this came from this answer.

Make a list of the time series dataframes: dn = [d1,d2,d3,...]
Concatenate the dataframes: dn = pd.concat(dl, axis=1)
Create a new column with the list of values:
dn['new'] = dn['data column name'].values.tolist() This generates the new column with the list of values that I can now use to make a box plot.

Combine time series data for boxplot from multiple csv files

1 Answers1