1

I have a file csv with the following columns

Customer List_Items
J.B.      [13,2,3,4,42,12]
F.C.      [1,44,51,24,12,53]
D.S.      NA 
B.V.      *
F.R.      [1,0,0,0,0,0]

I have max six items. The elements in the lists above are the frequency. The elements above are strings, not lists, since the file csv was created manually outside python coding. I would like to plot a multiline charts where, on the y-axis I have the frequency and on the x-axis the labels 1,2,3,4,5,6. To exclude values not in scope I could filter out NA and *. But I do not know how to plot lists in a multiline plot. Should I expand the lists in separate columns?

For example: J.B. would be represented by a line with

x-axis y-axis
1      13
2      2
3      3
4      4
5      42
6      12

Similarly for the other customers.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
LdM
  • 674
  • 7
  • 23

1 Answers1

2
  • A column can be converted from strings back to lists using:
  • In this case there needs to be a function, which can handle errors with try-except
  • df.T.apply(pd.Series.explode).reset_index(drop=True) works to separate the values in each list to individual rows.
    • This only works if each list is the same length, as shown in the example.
    • If the lists have different lengths, use the following line instead:
      • df = pd.concat([df.T[col].explode().reset_index(drop=True) for col in df.T.columns], axis=1)
  • Plot the dataframe with pandas.DataFrame.plot.

Imports and Load DataFrame

import pandas as pd
from ast import literal_eval  # convert string back to list
import numpy as np

# read the file
df = pd.read_csv('data.csv', sep='\\s+', index_col=[0])

# display(df)
                  List_Items
Customer                    
J.B.        [13,2,3,4,42,12]  ← this is a string
F.C.      [1,44,51,24,12,53]
D.S.                     NaN
B.V.                       *
F.R.           [1,0,0,0,0,0]

Function

# create a function with error handling to convert the column back to lists
def test(row):
    try:
        row = literal_eval(row)  # convert the string to a list
    except (ValueError, SyntaxError):
        row = np.nan  # malformed rows are returned as NaN
    return row

Fix Data and Plot

# apply the function to the row
df.List_Items = df.List_Items.apply(lambda row: test(row))

# drop nan
df = df.dropna()

# display(df)
                       List_Items
Customer                         
J.B.        [13, 2, 3, 4, 42, 12]  ← this is a list
F.C.      [1, 44, 51, 24, 12, 53]
F.R.           [1, 0, 0, 0, 0, 0]

# transpose the dataframe and explode the list values to separate rows
df = df.T.apply(pd.Series.explode).reset_index(drop=True)

# display(df)
Customer J.B. F.C. F.R.
0          13    1    1
1           2   44    0
2           3   51    0
3           4   24    0
4          42   12    0
5          12   53    0

# plot
ax = df.plot()

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158