1

I have a large data set with the format x,y,value1,value2.... value# is the value of that variable at the position x, y. The data is read in from a csv file with the x y values being in semi-random order. The x y values are not on rectilinear grid. I have on the order of millions of data points.

What I would like to do is create an image of the value# variable.

Is there a built in mechanism for doing this? If there is not a built in mechanism, how do I build a two array of the vaule# with the correct ordering.

JMD
  • 405
  • 1
  • 9
  • 17
  • I'm not 100% sure what you want to do. In order to save a plot you create, you use `savefig()`. Check out this answer - http://stackoverflow.com/questions/9622163/save-plot-to-image-file-instead-of-displaying-it-using-matplotlib-so-it-can-be]. Also, check out the docs on scatterplots - http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter. And as I always suggest, brush up on how to formulate a good, coherent question - http://stackoverflow.com/help/how-to-ask. – Austin A Feb 19 '15 at 18:34
  • 1
    I think what you need is an interpolation. Have a look at http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html – Imanol Luengo Feb 19 '15 at 18:37

1 Answers1

0

Do you only have single instances of x AND y? Are all your value#'s of equal length? If these are the cases it will be a lot easier for you. As far as I know, there is no simple way to tell imshow to do this, but hopefully someone else here knows more about this than I do. You might need to restructure the data. I would learn as much as I can about Python's Pandas package if you are wanting to work with large datasets. Like R, it allows the creation of data frames. I think imshow needs your data to be shaped as x by y with your value#'s as your cell values. Here is an example for you to follow that uses Pandas. There's probably a much more graceful way to go about this, but you should get the point.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame(columns=['x','y','data_value'])
df['x'] = [1,2,1,2]
df['y'] = [1,1,2,2]
df['data_value'] = [1,2,3,4]

print(df) # so you see what's going on

df2 = pd.DataFrame(columns=df['x'].unique(), index = df['y'].unique())

print(df2) # so you see what's going on

# making x columns and y rows
for i in df2.index:
    for j in df2.columns:
        df2.ix[i,j] = (df[(df['y']==i) & (df['x']==j)]['data_value']).values[0]

print(df2)

Oh, and going to plot this (imshow didn't like the ints here)

plt.imshow(np.array(df2.astype(float)))
plt.show()
user14241
  • 727
  • 1
  • 8
  • 27