2

I have three columns of data: two position values and one data value. I would like to pivot this data so that the elements of one column become the new columns and the elements of another one of the original columns become the indices. These data will be plotted using pcolormesh. pcolormesh expects the data to be structured such that it doesn't have to guess what to do. That is if there is a column of nans, pcolormesh will not fill in this column correctly. So I have written some code to correctly shape the data so that it can be fed to pcolormesh.

The problem I have is that the code seems to remove data around x = 0.0. I think this is occuring on the line where the dataframe is being reindexed to include the "missing" rows.

I've added a plot (and hence some extra code) to give a visual aide to the problem statement. The left plot shows the original data, the right plot shows the result after the data has been reshaped for pcolormesh.

The code example I have provided should run in an ipython notebook by only copying and pasting.

Any suggestions are welcome. Perhaps this solution is super complicated? It sure feels that way.

enter image description here

%matplotlib inline

import decimal
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

test_df = pd.DataFrame()
test_df['x'] = [-2, -1.5, -0.9, -0.7, -0.5, 0.0, 0.5, 1.1]
test_df['y'] = [1,2,4,5,6,7,5,4]
test_df['v'] = np.random.randn(8)

def get_precision(number):
    """
    gives the precision, or decimal place, of the number

    http://stackoverflow.com/questions/6189956/easy-way-of-finding-decimal-places
    """
    return int(abs(decimal.Decimal(str(number)).as_tuple().exponent))

def min_max(column):
    column_min = np.floor(column.min())
    column_max = np.ceil(column.max())
    return column_min, column_max

def construct_df_for_pcolormesh(df, col, ix, values, columns_increment, index_increment):
    columns_increment = 1.0/columns_increment
    index_increment = 1.0/index_increment

    columns_precision = get_precision(columns_increment)
    index_precision = get_precision(index_increment)

    columns_min, columns_max = min_max(df[col])
    index_min, index_max = min_max(df[ix])

    columns = np.linspace(columns_min, columns_max, (columns_max - columns_min)*columns_increment + 1)
    index = np.linspace(index_min, index_max, (index_max - index_min)*index_increment + 1)

    new_index = [(round(c, columns_precision), round(i, index_precision)) for c in columns for i in index]

    df_for_pcolormesh = df.set_index([col, ix]).reindex(new_index).reset_index()
    df_for_pcolormesh = df_for_pcolormesh.pivot(index=ix, columns=col, values=values)
    return df_for_pcolormesh

fig, (ax,ax1)= plt.subplots(1,2, sharey=True, sharex=True)

test_df.plot(kind='scatter', x='x', y='y', s=100, grid=True, ax=ax)
ax.set_ylim(0,8)
ax.set_xlim(-2.5, 1.5)
ax.set_title('Plot with all the data')

data_df = construct_df_for_pcolormesh(test_df, 'x', 'y', 'v', 0.1, 0.1)

depths = data_df.index
xx = data_df.columns

d, x = np.meshgrid(depths, xx)
data = np.ma.masked_invalid(data_df.values)

ax1.pcolormesh(x, d, data.transpose(), cmap='viridis')
ax1.grid(True)
ax1.set_ylim(0,8)
ax1.set_xlim(-2.5, 1.5)
ax1.set_title('Plot with missing\ndatapoint at x=0.0')
abcd
  • 10,215
  • 15
  • 51
  • 85
mnky9800n
  • 1,113
  • 2
  • 15
  • 33
  • 1
    generally you want to have a minimal example in your question. you've included a lot of code. can you remove some inessential things, leaving just the code that causes the value to go missing? – abcd Aug 13 '16 at 03:52

1 Answers1

2

I am not sure about the real reason. However, I changed your min_max function to:

def min_max(column):
    column_min = np.floor(column.min())
    column_max = np.ceil(column.max()) + 1
    return column_min, column_max

And then it worked:

enter image description here

Nehal J Wani
  • 16,071
  • 3
  • 64
  • 89
  • This solves the problem that I created in my example but I apparently failed to recreate the problem that I had in my non-example code. – mnky9800n Aug 15 '16 at 15:52