2

I have time series data in hdf format. I use the code below to read the data from the hdf files. Now I tried to join data on the basis of latitude and longitude for those data having same jdn (julian day number). Data with same julian day number represent the continuous spatial data

import glob
import numpy as np
import os
from pyhdf.SD import SD,SDC


files = glob.glob('MOD04*')
files.sort()
for f in files:
    product = f[0:5]+ '-Atmospheric Product'
    year = f[10:14]
    jdn = f[14:17] # julian day number

    # Read dataset.
    hdf = SD(f, SDC.READ)
    data3D = hdf.select('Deep_Blue_Aerosol_Optical_Depth_550_Land')
    data = data3D[:,:].astype(np.double)

    # Read geolocation dataset 
    lat = hdf.select('Latitude')
    latitude = lat[:,:]
    lon = hdf.select('Longitude')
    longitude = lon[:,:]

my data are attached in this link: https://drive.google.com/folderview?id=0B2rkXkOkG7ExX2lTTWEySU1fOWc&usp=sharing

bikuser
  • 2,013
  • 4
  • 33
  • 57
  • What is your end goal? (i.e. one big data array with sequential information from each file? Something else?) – Heather QC Jul 13 '16 at 15:05
  • @Heather QC my end goal is to get the daily time series data sets. so I tried to combine data from files having same julian days but not able to succeed. as you said one big data array with sequential information from each file :). – bikuser Jul 13 '16 at 15:16

2 Answers2

2

Numpy's hstack, vstack, or dstack (depending on the axis you'd like to join the arrays) will join multidimensional arrays.

Note that for MODIS aerosol data specifically, using hstack to join the arrays will occasionally throw an error because sometimes the arrays are 203 x 135 and sometimes 204 x 135 so the horizontal dimension won't always match

Building on your code (not pretty, but functional):

import glob
import numpy as np
import os
from pyhdf.SD import SD,SDC


files = glob.glob('MOD04*')
files.sort()
for n, f in enumerate(files):
    product = f[0:5]+ '-Atmospheric Product'
    year = f[10:14]
    jdn = f[14:17] # julian day number

    # Read dataset.
    hdf = SD(f, SDC.READ)
    data3D = hdf.select('Deep_Blue_Aerosol_Optical_Depth_550_Land')
    data = data3D[:,:].astype(np.double)

   # Read geolocation dataset 
    lat = hdf.select('Latitude')
    latitude = lat[:,:]
    lon = hdf.select('Longitude')
    longitude = lon[:,:]

    if n != 0 and jdn != old_jdn:
        #do analysis; write to file for later analysis; etc.
        pass

    if n == 0 or jdn != old_jdn:
        data_timeseries = data
        latitude_timeseries = latitude
        longitude_timeseries = longitude
    else:
        data_timeseries = np.vstack((data_timeseries, data))
        latitude_timeseries = np.vstack((latitude_timeseries, latitude))
        longitude_timeseries = np.vstack((longitude_timeseries, longitude))

    print data_timeseries.shape
    print latitude_timeseries.shape
    print longitude_timeseries.shape

    old_jdn = jdn
Heather QC
  • 680
  • 8
  • 11
  • I am looking for something like dstack but not able to join as Mosaic – bikuser Jul 13 '16 at 15:30
  • What do you mean by Mosaic? – Heather QC Jul 13 '16 at 15:39
  • since this is gridded data with x and y dimension as longitude and latitude. I want to join datasets with same julian day number. Simply data with same julian day number is like two tiles (images) of same day so I want to join these two tiles to get single one as like in gdal mosaic function – bikuser Jul 13 '16 at 15:45
  • I have worked quite a bit with the MODIS aerosol product but not gdal, so I'm not clear on what that function does. I have used vstack to stitch granules of data together for analysis, but end up with separate stitched arrays for each data field (deep blue AOD, lat, lon, etc). If you are trying to get data + lat + lon for each julian day in a single array, hopefully someone else has a better solution! Good luck! – Heather QC Jul 13 '16 at 15:54
  • Hi Heather QC, thank you for your comments and suggestion...actually I am also working with AOD datasets and try to stitch but not able to succeed..if you want you can share your code for stitch part only..that would be very helpful for me and for my research...and I will highly acknowledge :) – bikuser Jul 13 '16 at 16:03
2

just to follow up on Heather QC' answer, here is an illustration of the np.stack functions and which dimensions are concerned:

arr1 = np.array([[[1,2],[2,3]],
                 [[1,2],[2,3]],
                 [[1,2],[2,3]]])

arr2 = np.array([[[5,6],[8,7]],
                 [[7,6],[7,8]],
                 [[6,7],[7,8]]])

print("arr1 shape  ", arr1.shape)    
print("arr2 shape  ", arr2.shape)    
print("vstack shape", np.vstack((arr1, arr2)).shape)
print("hstack shape", np.hstack((arr1, arr2)).shape)
print("dstack shape", np.dstack((arr1, arr2)).shape)

>>> arr1 shape   (3, 2, 2)
>>> arr2 shape   (3, 2, 2)
>>> vstack shape (6, 2, 2)
>>> hstack shape (3, 4, 2)
>>> dstack shape (3, 2, 4)
Gabriel123
  • 426
  • 5
  • 11