I have three columns of data: two position values and one data value. I would like to pivot this data so that the elements of one column become the new columns and the elements of another one of the original columns become the indices. These data will be plotted using pcolormesh
. pcolormesh
expects the data to be structured such that it doesn't have to guess what to do. That is if there is a column of nans, pcolormesh
will not fill in this column correctly. So I have written some code to correctly shape the data so that it can be fed to pcolormesh
.
The problem I have is that the code seems to remove data around x = 0.0
. I think this is occuring on the line where the dataframe is being reindexed to include the "missing" rows.
I've added a plot (and hence some extra code) to give a visual aide to the problem statement. The left plot shows the original data, the right plot shows the result after the data has been reshaped for pcolormesh
.
The code example I have provided should run in an ipython notebook by only copying and pasting.
Any suggestions are welcome. Perhaps this solution is super complicated? It sure feels that way.
%matplotlib inline
import decimal
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
test_df = pd.DataFrame()
test_df['x'] = [-2, -1.5, -0.9, -0.7, -0.5, 0.0, 0.5, 1.1]
test_df['y'] = [1,2,4,5,6,7,5,4]
test_df['v'] = np.random.randn(8)
def get_precision(number):
"""
gives the precision, or decimal place, of the number
http://stackoverflow.com/questions/6189956/easy-way-of-finding-decimal-places
"""
return int(abs(decimal.Decimal(str(number)).as_tuple().exponent))
def min_max(column):
column_min = np.floor(column.min())
column_max = np.ceil(column.max())
return column_min, column_max
def construct_df_for_pcolormesh(df, col, ix, values, columns_increment, index_increment):
columns_increment = 1.0/columns_increment
index_increment = 1.0/index_increment
columns_precision = get_precision(columns_increment)
index_precision = get_precision(index_increment)
columns_min, columns_max = min_max(df[col])
index_min, index_max = min_max(df[ix])
columns = np.linspace(columns_min, columns_max, (columns_max - columns_min)*columns_increment + 1)
index = np.linspace(index_min, index_max, (index_max - index_min)*index_increment + 1)
new_index = [(round(c, columns_precision), round(i, index_precision)) for c in columns for i in index]
df_for_pcolormesh = df.set_index([col, ix]).reindex(new_index).reset_index()
df_for_pcolormesh = df_for_pcolormesh.pivot(index=ix, columns=col, values=values)
return df_for_pcolormesh
fig, (ax,ax1)= plt.subplots(1,2, sharey=True, sharex=True)
test_df.plot(kind='scatter', x='x', y='y', s=100, grid=True, ax=ax)
ax.set_ylim(0,8)
ax.set_xlim(-2.5, 1.5)
ax.set_title('Plot with all the data')
data_df = construct_df_for_pcolormesh(test_df, 'x', 'y', 'v', 0.1, 0.1)
depths = data_df.index
xx = data_df.columns
d, x = np.meshgrid(depths, xx)
data = np.ma.masked_invalid(data_df.values)
ax1.pcolormesh(x, d, data.transpose(), cmap='viridis')
ax1.grid(True)
ax1.set_ylim(0,8)
ax1.set_xlim(-2.5, 1.5)
ax1.set_title('Plot with missing\ndatapoint at x=0.0')