How do you remove corresponding x values of missing y-data from lists?

Question

import matplotlib.pyplot as plt #for graphing data
import numpy as np

plt.figure()

x = col1 = [2011.005, 2012.6543, 2013.3456, 2014.7821, 2015.3421, 2016.7891, 2017.0173, 2018.1974]
col2 = [1.4356, "", 5.32245, 6.542, 7.567, .77558, "", ""]
col3 = [1.3345, 2.345, "", 5.356, 3.124, 6.12, "", ""]
col4 = [0.67, 4.235, "", 6.78, "", "", 9.56, ""]

plt.plot(col1, col2, label="Sample 1")
plt.plot(col1, col3, label="Sample 2")
plt.plot(col1, col4, label="Sample 3")

When I plot this graph the y-axis looks very off. Realising I need to remove the "" spaces in the list, I tried this method:

x1 = []
y1 = []
for index in range(len(col2)):
    if (col2[index] != ""):
        y1.append(col2[index])
        x1.append(col1[index])

x2 = []
y2 = []
for index in range(len(col3)):
    if (col3[index] != ""):
        y2.append(col3[index])
        x2.append(col1[index])

x3 = []
y3 = []
for index in range(len(col4)):
    if (col4[index] != ""):
        y3.append(col4[index])
        x2.append(col1[index])

print(x2) #showed that there were 9 values for x2 and 5 values for x1
print(y2)

plt.plot(x1, y1, "b.", linewidth = 1, label="Sample 1")
plt.plot(x2, y2, "g.", linewidth = 1, label="Sample 2")
plt.plot(x3, y3, "k.", linewidth = 1, label="Sample 3")

plt.title("Testing", fontsize=16)

plt.show()

This showed me a dimensional error. I don't know how to only extract the corresponding values of x to the y values.

say for col 2 and x, (2012.6543, " ") becomes (2012.6543, nan) and therefore not show up as a plot? I'm so sorry, I'm new to python so everything seems overwhelming — Amnesia Smith, Apr 09 '21 at 21:06
I tried importing pandas but it doesn't seem to work and the instructions on how to do this seem to clash too. Thank you for your help! — Amnesia Smith, Apr 09 '21 at 22:17

JohanC · Accepted Answer · 2021-04-09T22:55:45.590

You can use pandas' pd.to_numeric(..., errors='coerce') to convert each of the strings in the lists to 'nan'. (Numpy's np.genfromtxt(np.array(..., dtype=str)) does something similar, but also removes the empty strings).

nan values will be skipped while plotting. Matplotlib puts its list of x-values next to the corresponding y-values, e.g. 2011.005, 1.4356 for the first pair and 2012.6543, np.nan for the second. Each pair that has one or two nan values will not be plotted.

Here is some example code:

import matplotlib.pyplot as plt
import pandas as pd

col1 = [2011.005, 2012.6543, 2013.3456, 2014.7821, 2015.3421, 2016.7891, 2017.0173, 2018.1974]
col2 = [1.4356, "", 5.32245, 6.542, 7.567, .77558, "", ""]
col3 = [1.3345, 2.345, "", 5.356, 3.124, 6.12, "", ""]
col4 = [0.67, 4.235, "", 6.78, "", "", 9.56, ""]
col1 = pd.to_numeric(col1, errors='coerce')
col2 = pd.to_numeric(col2, errors='coerce')
col3 = pd.to_numeric(col3, errors='coerce')
col4 = pd.to_numeric(col4, errors='coerce')

plt.figure()
plt.plot(col1, col2, "b.", linewidth=1, label="Sample 1")
plt.plot(col1, col3, "g.", linewidth=1, label="Sample 2")
plt.plot(col1, col4, "r.", linewidth=1, label="Sample 3")
plt.legend()
plt.show()

It is unclear how your csv file looks like. The following example supposes the file looks like csv_as_str. (StringIO is a function that lets you mimic a file with a string, so it is easier to add to a post. Reading from a file would just be df = pd.read_csv('your_file.csv').)

import pandas as pd
from io import StringIO

csv_as_str ='''
col1,col2,col3,col4
2011.005,1.4356,1.3345,0.67
2012.6543,,2.345,4.235
2013.3456,5.32245,,
2014.7821,6.542,5.356,6.78
2015.3421,7.567,3.124,
2016.7891,0.77558,6.12,
2017.0173,,,9.56
2018.1974,,,
'''
df = pd.read_csv(StringIO(csv_as_str))

Then the dataframe already has nan for the empty spots:

        col1     col2    col3   col4
0  2011.0050  1.43560  1.3345  0.670
1  2012.6543      NaN  2.3450  4.235
2  2013.3456  5.32245     NaN    NaN
3  2014.7821  6.54200  5.3560  6.780
4  2015.3421  7.56700  3.1240    NaN
5  2016.7891  0.77558  6.1200    NaN
6  2017.0173      NaN     NaN  9.560
7  2018.1974      NaN     NaN    NaN

I have edited the first version to my CSV and it looks correct! Thank you so so very much! I spent 3 days trying to code this, thought I was done and couldn't format the ticks. That's when I clocked that my code was wrong. I can't express how grateful I am! — Amnesia Smith, Apr 09 '21 at 23:06
Hi, sorry for bothering you again, but do you know how to add multiple linear regression lines to this data? I've tried x0 = np.array(x) y0 = np.array(y1) m,b = np.polyfit(x0, y0, 1) print(b) print(m) plt.grid() plt.plot(x0, m*x0 + b) but m and b show up as nan — Amnesia Smith, Apr 10 '21 at 14:30
See https://stackoverflow.com/questions/28647172/numpy-polyfit-doesnt-handle-nan-values — JohanC, Apr 10 '21 at 14:55
I'm getting nan for m2, b2, m3, b3. I've edited this page to show what I've written. — Amnesia Smith, Apr 10 '21 at 15:28

JonSG · Answer 2 · 2021-04-09T22:48:38.343

Conceptually what you want to do is iterate over the two arrays in question and only plot values where both array have "data". There are more elegant ways of doing this (see @JohanC), but in you may get some value by reviewing something like:

import matplotlib.pyplot as plt #for graphing data
import numpy as np

plt.figure()

x = col1 = [2011.005, 2012.6543, 2013.3456, 2014.7821, 2015.3421, 2016.7891, 2017.0173, 2018.1974]
col2 = [1.4356, "", 5.32245, 6.542, 7.567, .77558, "", ""]
col3 = [1.3345, 2.345, "", 5.356, 3.124, 6.12, "", ""]
col4 = [0.67, 4.235, "", 6.78, "", "", 9.56, ""]

## -------------------------
## pair up the arrays we want to work with.
## This will make it a little easier to deal with them programatically
## -------------------------
work_items = [
    [col1, col2],
    [col1, col3],
    [col1, col4],
]
## -------------------------

for work_index, work_item in enumerate(work_items):
    input_col1 = work_item[0]
    input_coln = work_item[1]
    plotting_col1 = []
    plotting_coln = []
    plotting_name = "Sample %s" % work_index

    for index, col1_value in enumerate(input_col1):
        coln_value = input_coln[index]

        if col1_value == "" or coln_value == "":
            # we are not interested in plotting this pair
            continue

        plotting_col1.append(col1_value)
        plotting_coln.append(coln_value)

    plt.plot(plotting_col1, plotting_coln, "b.", linewidth = 1, label=plotting_name)

How do you remove corresponding x values of missing y-data from lists?

2 Answers2