I have a non-standard CSV file that looks something like this:
x,y
1,"(5, 27, 4)"
2,"(3, 1, 6, 2)"
3,"(4, 5)"
Using pd.read_csv()
leads to something that's not all that useful, because the tuples are not parsed. There are a existing answers that address this (1, 2), but because these tuples have heterogeneous lengths, those answers aren't entirely useful for the problem I'm having.
What I'd like to do is plot x
vs y
using the pandas plotting routines. The naive approach leads to an error because the tuples are stored as strings:
>>> # df = pd.read_csv('data.csv')
>>> df = pd.DataFrame({'x': [1, 2, 3],
'y': ["(5, 27, 4)","(3, 1, 6, 2)","(4, 5)"]})
>>> df.plot.scatter('x', 'y')
[...]
ValueError: scatter requires y column to be numeric
The result I'd hope for is something like this:
import numpy as np
import matplotlib.pyplot as plt
for x, y in zip(df['x'], df['y']):
y = eval(y)
plt.scatter(x * np.ones_like(y), y, color='blue')
Is there a straightforward way to create this plot directly from Pandas, by transforming the dataframe and using df.plot.scatter()
(and preferably without using eval()
)?