How to read a csv with a malformed column of lists and plot the values

Question

I have a file csv with the following columns

Customer List_Items
J.B.      [13,2,3,4,42,12]
F.C.      [1,44,51,24,12,53]
D.S.      NA 
B.V.      *
F.R.      [1,0,0,0,0,0]

I have max six items. The elements in the lists above are the frequency. The elements above are strings, not lists, since the file csv was created manually outside python coding. I would like to plot a multiline charts where, on the y-axis I have the frequency and on the x-axis the labels 1,2,3,4,5,6. To exclude values not in scope I could filter out NA and *. But I do not know how to plot lists in a multiline plot. Should I expand the lists in separate columns?

For example: J.B. would be represented by a line with

x-axis y-axis
1      13
2      2
3      3
4      4
5      42
6      12

Similarly for the other customers.

Trenton McKinney · Accepted Answer · 2021-10-10T00:10:34.373

A column can be converted from strings back to lists using:
- df = pd.read_csv('data.csv', sep='\\s+', index_col=[0], converters={'List_Items': literal_eval}) - This will not work for this OP
  - See this answer for Pandas DataFrame stored list as string: How to convert back to list
- But only if the column does not have malformed values (e.g. 'nan', '*')
In this case there needs to be a function, which can handle errors with try-except
df.T.apply(pd.Series.explode).reset_index(drop=True) works to separate the values in each list to individual rows.
- This only works if each list is the same length, as shown in the example.
- If the lists have different lengths, use the following line instead:
  - df = pd.concat([df.T[col].explode().reset_index(drop=True) for col in df.T.columns], axis=1)
Plot the dataframe with pandas.DataFrame.plot.

Imports and Load DataFrame

import pandas as pd
from ast import literal_eval  # convert string back to list
import numpy as np

# read the file
df = pd.read_csv('data.csv', sep='\\s+', index_col=[0])

# display(df)
                  List_Items
Customer                    
J.B.        [13,2,3,4,42,12]  ← this is a string
F.C.      [1,44,51,24,12,53]
D.S.                     NaN
B.V.                       *
F.R.           [1,0,0,0,0,0]

Function

# create a function with error handling to convert the column back to lists
def test(row):
    try:
        row = literal_eval(row)  # convert the string to a list
    except (ValueError, SyntaxError):
        row = np.nan  # malformed rows are returned as NaN
    return row

Fix Data and Plot

# apply the function to the row
df.List_Items = df.List_Items.apply(lambda row: test(row))

# drop nan
df = df.dropna()

# display(df)
                       List_Items
Customer                         
J.B.        [13, 2, 3, 4, 42, 12]  ← this is a list
F.C.      [1, 44, 51, 24, 12, 53]
F.R.           [1, 0, 0, 0, 0, 0]

# transpose the dataframe and explode the list values to separate rows
df = df.T.apply(pd.Series.explode).reset_index(drop=True)

# display(df)
Customer J.B. F.C. F.R.
0          13    1    1
1           2   44    0
2           3   51    0
3           4   24    0
4          42   12    0
5          12   53    0

# plot
ax = df.plot()

Thank you so much for your help and for your well-explained answer. It makes sense to me and it works fine. Thanks for your time and help Trenton — LdM, Oct 10 '21 at 18:47

How to read a csv with a malformed column of lists and plot the values

1 Answers1

Imports and Load DataFrame

Function

Fix Data and Plot