1

I'd like to plot "MJD" vs "MULTIPLE_MJD" for the data given here:: https://www.dropbox.com/s/cicgc1eiwrz93tg/DR14Q_pruned_several3cols.csv?dl=0

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import ast

filename = 'DR14Q_pruned_several3cols.csv'
datafile= path+filename
df = pd.read_csv(datafile)

df.plot.scatter(x='MJD', y='N_SPEC')
plt.show()

ser = df['MJD_DUPLICATE'].apply(ast.literal_eval).str[1]
df['MJD_DUPLICATE'] = pd.to_numeric(ser, errors='coerce')
df['MJD_DUPLICATE_NEW'] = pd.to_numeric(ser, errors='coerce')

df.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()

This makes a plot, but only for one value of MJD_DUPLICATE::

print(df['MJD_DUPLICATE_NEW'])

0 55214 1 55209 ...

Thoughts??

npross
  • 1,756
  • 6
  • 19
  • 38
  • It's unclear to me what you actually want to do. The MJD_DUPLICATE column contains tuples of values: do you want to turn these into multiple columns and plot each of them? Choose the first value as the value in the column? Something else? Some information as to what it is you hope to accomplish would help folks here in providing you a useful answer. – jakevdp Oct 16 '17 at 03:17
  • This answer seems relevant: https://stackoverflow.com/questions/23661583/reading-back-tuples-from-a-csv-file-with-pandas as does this: https://stackoverflow.com/questions/29550414/how-to-split-column-of-tuples-in-pandas-dataframe – jakevdp Oct 16 '17 at 03:22
  • Jake, I'm trying to plot all the values of "MJD_DUPLICATE" on the y-axis for a (single) values of MJD on the x-axis. MJD is a singular entry. MJD_DUPLICATE can have two, or up to ~50 entries. – npross Oct 16 '17 at 09:06
  • I'm not sure those answers above are directly relevant. It seems with the .apply I can change these to tuples. It's the plotting many values where everyone is falling down. – npross Oct 16 '17 at 09:09

1 Answers1

0

There are two issues here:

  1. Telling Pandas to parse tuples within the CSV. This is covered here: Reading back tuples from a csv file with pandas
  2. Transforming the tuples into multiple rows. This is covered here: Getting a tuple in a Dafaframe into multiple rows

Putting those together, here is one way to solve your problem:

# Following https://stackoverflow.com/questions/23661583/reading-back-tuples-from-a-csv-file-with-pandas
import pandas as pd
import ast
df = pd.read_csv("DR14Q_pruned_several3cols.csv",
                 converters={"MJD_DUPLICATE": ast.literal_eval})

# Following https://stackoverflow.com/questions/39790830/getting-a-tuple-in-a-dafaframe-into-multiple-rows
df2 = pd.DataFrame(df.MJD_DUPLICATE.tolist(), index=df.MJD)
df3 = df2.stack().reset_index(level=1, drop=True)

# Now just plot!
df3.plot(marker='.', linestyle='none')

enter image description here

If you want to remove the 0 and -1 values, a mask will work:

df3[df3 > 0].plot(marker='.', linestyle='none')

enter image description here

jakevdp
  • 77,104
  • 11
  • 125
  • 160