1

I would like to plot "MJD" vs. "MJD_DUPLICATE" with the (13MB) dataset DR14Q_pruned_repeats.csv" found here:: https://www.dropbox.com/s/1dyong27bre3p9j/DR14Q_pruned_repeats.csv?dl=0

Here is my code:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from astropy.table import Table
from astropy.io import ascii
from astropy.io import fits

filename = 'DR14Q_pruned_repeats.csv'
df = pd.read_csv(filename)

multiples = df[df["N_SPEC"] >2]

multiples.plot.scatter(x='MJD', y='N_SPEC')
plt.show()

multiples.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()

The MJD vs. MJD_DUPLICATE plotting line returns an error::

ValueError: scatter requires y column to be numeric

and the pd.to_numeric line returns just NaNs.

npross
  • 1,756
  • 6
  • 19
  • 38

1 Answers1

0

You need:

import ast

doubles   = df[df["N_SPEC"] ==2].copy()
multiples = df[df["N_SPEC"] >2].copy()
repeats   = df[df["N_SPEC"] >1].copy()

multiples.plot.scatter(x='MJD', y='N_SPEC')
plt.show()

Convert column MJD_DUPLICATE to tuples from strings and then select value by position - e.g. str[1] for second values of tuples:

print (multiples['MJD_DUPLICATE'].head(10))
5      (0, 56279, 0, 56539, 0, 56957, -1, -1, -1, -1,...
85     (0, 56243, 0, 56543, 0, 57328, -1, -1, -1, -1,...
170    (0, 52262, 0, 55447, 0, 57011, -1, -1, -1, -1,...
200    (0, 52262, 0, 55443, 0, 57006, -1, -1, -1, -1,...
262    (0, 52525, 0, 55443, 0, 57011, -1, -1, -1, -1,...
277    (0, 51793, 0, 55531, 0, 57006, -1, -1, -1, -1,...
287    (0, 55182, 0, 55184, 0, 55443, -1, -1, -1, -1,...
313    (0, 56248, 0, 56245, 0, 56572, -1, -1, -1, -1,...
314    (0, 55182, 0, 55184, 0, 55444, -1, -1, -1, -1,...
324    (0, 52261, 0, 55184, 0, 55444, -1, -1, -1, -1,...
Name: MJD_DUPLICATE, dtype: object

ser = multiples['MJD_DUPLICATE'].apply(ast.literal_eval).str[1]
multiples['MJD_DUPLICATE'] = pd.to_numeric(ser, errors='coerce')

print (multiples['MJD_DUPLICATE'].head(10)) 
5      56279
85     56243
170    52262
200    52262
262    52525
277    51793
287    55182
313    56248
314    55182
324    52261
Name: MJD_DUPLICATE, dtype: int64

multiples.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This works, but doesn't do what I'm after. I need to keep all the numeric data in the MJD_DUPLICATES, not just a second column. – npross Oct 15 '17 at 15:22
  • Yes, then create new column with new name `multiples['MJD_DUPLICATE_NEW'] = pd.to_numeric(ser, errors='coerce')` and plot it `multiples.plot.scatter(x='MJD', y='MJD_DUPLICATE_NEW')` – jezrael Oct 15 '17 at 15:24
  • Simply cannot plot tuples, need scalars. – jezrael Oct 15 '17 at 15:27
  • No, sorry, I still don't think this is working. When I look at the MJD_DUPLICATE_NEW variable, this only has the two columns as given above. I'm going to dig a bit more here, as the data itself is not conducive to iterations, but will be back! – npross Oct 15 '17 at 16:22
  • Hmmm, maybe main question is ow do you want to convert `MJD_DUPLICATE` column to normal non tuple column? Because it is necessary for ploting. – jezrael Oct 15 '17 at 16:27
  • Jezrael, have a look at:: https://stackoverflow.com/questions/46758107/plotting-a-multiple-column-in-pandas-converting-strings-to-floats – npross Oct 15 '17 at 17:45