Concatenate np arrays in pandas dataframe and plot

Question

I have the following dataframe:

df = 

sample                  measurements
  1       [0.2, 0.22, 0.3, 0.7, 0.4, 0.35, 0.2]
  2       [0.2, 0.17, 0.6, 0.6, 0.54, 0.32, 0.2]
  5       [0.2, 0.39, 0.40, 0.53, 0.41, 0.3, 0.2]
  7       [0.2, 0.29, 0.46, 0.68, 0.44, 0.35, 0.2]

The data type in df['measurements'] is a 1-D np.array. I'm trying to concatenate each np.array in the column "measurements" and plot it as a time series, but issue is that the samples are discontinuous, and the interval between points is not consistent due to missing data. What is the best way I can concatenate the arrays and plot them such that there is just a gap in the plot between samples 2 and 5 and 5 and 7?

What do you mean concatenating each arrays? I have no idea of which things you want to concatenate. — jaemmin, Feb 15 '23 at 04:55

JohanC · Accepted Answer · 2023-02-15T07:19:47.920

Depending on how you want to use the data, you can either convert the individual elements to new rows ("long form"), or create new columns ("wide form").

Convert to new rows

This is the preferred format for seaborn. explode() creates new rows from the array elements. Optionally, groupby() together with cumcount() can add a position.

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame({'sample': [1, 2, 5, 7],
                   'measurements': [np.array([0.2, 0.22, 0.3, 0.7, 0.4, 0.35, 0.2]),
                                    np.array([0.2, 0.17, 0.6, 0.6, 0.54, 0.32, 0.2]),
                                    np.array([0.2, 0.39, 0.40, 0.53, 0.41, 0.3, 0.2]),
                                    np.array([0.2, 0.29, 0.46, 0.68, 0.44, 0.35, 0.2])]})
df1 = df.explode('measurements', ignore_index=True)
df1['position'] = df1.groupby('sample').cumcount() + 1
sns.lineplot(df1, x='sample', y='measurements', hue='position', palette='bright')
plt.show()

Convert to new columns

If all arrays have the same length, each element can be converted to a new column. This is how pandas usually prefers to organize it data. New columns are created by applying to_list on the original column.

from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.DataFrame({'sample': [1, 2, 5, 7],
                   'measurements': [np.array([0.2, 0.22, 0.3, 0.7, 0.4, 0.35, 0.2]),
                                    np.array([0.2, 0.17, 0.6, 0.6, 0.54, 0.32, 0.2]),
                                    np.array([0.2, 0.39, 0.40, 0.53, 0.41, 0.3, 0.2]),
                                    np.array([0.2, 0.29, 0.46, 0.68, 0.44, 0.35, 0.2])]})

df2 = pd.DataFrame(df['measurements'].to_list(),
                   columns=[f'measurement{i + 1}' for i in range(7)],
                   index=df['sample'])
df2.plot()
plt.show()

Concatenate np arrays in pandas dataframe and plot

1 Answers1

Convert to new rows

Convert to new columns