Plotting a multiple column in Pandas (converting strings to floats)

Question

I'd like to plot "MJD" vs "MULTIPLE_MJD" for the data given here:: https://www.dropbox.com/s/cicgc1eiwrz93tg/DR14Q_pruned_several3cols.csv?dl=0

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import ast

filename = 'DR14Q_pruned_several3cols.csv'
datafile= path+filename
df = pd.read_csv(datafile)

df.plot.scatter(x='MJD', y='N_SPEC')
plt.show()

ser = df['MJD_DUPLICATE'].apply(ast.literal_eval).str[1]
df['MJD_DUPLICATE'] = pd.to_numeric(ser, errors='coerce')
df['MJD_DUPLICATE_NEW'] = pd.to_numeric(ser, errors='coerce')

df.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()

This makes a plot, but only for one value of MJD_DUPLICATE::

print(df['MJD_DUPLICATE_NEW'])

0 55214 1 55209 ...

Thoughts??

It's unclear to me what you actually want to do. The MJD_DUPLICATE column contains tuples of values: do you want to turn these into multiple columns and plot each of them? Choose the first value as the value in the column? Something else? Some information as to what it is you hope to accomplish would help folks here in providing you a useful answer. — jakevdp, Oct 16 '17 at 03:17
This answer seems relevant: https://stackoverflow.com/questions/23661583/reading-back-tuples-from-a-csv-file-with-pandas as does this: https://stackoverflow.com/questions/29550414/how-to-split-column-of-tuples-in-pandas-dataframe — jakevdp, Oct 16 '17 at 03:22
Jake, I'm trying to plot all the values of "MJD_DUPLICATE" on the y-axis for a (single) values of MJD on the x-axis. MJD is a singular entry. MJD_DUPLICATE can have two, or up to ~50 entries. — npross, Oct 16 '17 at 09:06
I'm not sure those answers above are directly relevant. It seems with the .apply I can change these to tuples. It's the plotting many values where everyone is falling down. — npross, Oct 16 '17 at 09:09

score 0 · Answer 1 · answered Oct 16 '17 at 19:09

There are two issues here:

Telling Pandas to parse tuples within the CSV. This is covered here: Reading back tuples from a csv file with pandas
Transforming the tuples into multiple rows. This is covered here: Getting a tuple in a Dafaframe into multiple rows

Putting those together, here is one way to solve your problem:

# Following https://stackoverflow.com/questions/23661583/reading-back-tuples-from-a-csv-file-with-pandas
import pandas as pd
import ast
df = pd.read_csv("DR14Q_pruned_several3cols.csv",
                 converters={"MJD_DUPLICATE": ast.literal_eval})

# Following https://stackoverflow.com/questions/39790830/getting-a-tuple-in-a-dafaframe-into-multiple-rows
df2 = pd.DataFrame(df.MJD_DUPLICATE.tolist(), index=df.MJD)
df3 = df2.stack().reset_index(level=1, drop=True)

# Now just plot!
df3.plot(marker='.', linestyle='none')

If you want to remove the 0 and -1 values, a mask will work:

df3[df3 > 0].plot(marker='.', linestyle='none')

This does answer the question you asked above, right:? – jakevdp Oct 17 '17 at 17:16 — jakevdp, Oct 17 '17 at 17:16

Plotting a multiple column in Pandas (converting strings to floats)

1 Answers1

Linked