27

I'm sure this is really simple if you know anything about binary files, but I'm a newbie on that score.

How would I extract the data from NASA .hgt files? Here is a description from www2.jpl.nasa.gov/srtm/faq.html:

The SRTM data files have names like "N34W119.hgt". What do the letters and numbers refer to, and what is ".hgt" format?

Each data file covers a one-degree-of-latitude by one-degree-of-longitude block of Earth's surface. The first seven characters indicate the southwest corner of the block, with N, S, E, and W referring to north, south, east, and west. Thus, the "N34W119.hgt" file covers latitudes 34 to 35 North and longitudes 118-119 West (this file includes downtown Los Angeles, California). The filename extension ".hgt" simply stands for the word "height", meaning elevation. It is NOT a format type. These files are in "raw" format (no headers and not compressed), 16-bit signed integers, elevation measured in meters above sea level, in a "geographic" (latitude and longitude array) projection, with data voids indicated by -32768. International 3-arc-second files have 1201 columns and 1201 rows of data, with a total filesize of 2,884,802 bytes ( = 1201 x 1201 x 2). United States 1-arc-second files have 3601 columns and 3601 rows of data, with a total filesize of 25,934,402 bytes ( = 3601 x 3601 x 2). For more information read the text file "SRTM_Topo.txt" at http://edcftp.cr.usgs.gov/pub/data/srtm/Readme.html

Thanks for any help! I am going to use this data in a python script, so if you could not use any language-specific tricks for any other languages, that would be awesome.

  • 3
    The link in the question is broken, but I think this is the same file: http://dds.cr.usgs.gov/srtm/version1/Documentation/SRTM_Topo.txt – Hubro May 14 '12 at 11:06

6 Answers6

13

A tested numpy example:

import os
import math
import numpy

fn = 'DMV/N51E000.hgt'

siz = os.path.getsize(fn)
dim = int(math.sqrt(siz/2))

assert dim*dim*2 == siz, 'Invalid file size'

data = numpy.fromfile(fn, numpy.dtype('>i2'), dim*dim).reshape((dim, dim))
hruske
  • 2,205
  • 19
  • 27
  • Note that this first reads the file into a flat, 1D Numpy array and then reshapes into a 2D array. It may be faster to first read the bytes and then load straight into a 2D array: `numpy.ndarray((dim, dim), numpy.dtype('>i2'), buffer)` where `buffer` is the raw bytes from the file. – Kyle Barron Mar 28 '20 at 19:14
8

Since the records are fixed length (16-bit signed integers) and you know the grid size (1201 x 1201 or 3601x3601), Python's struct module seems ideally suited (untested code):

from struct import unpack,calcsize

# 'row_length' being 1201 or 3601 and 'row' being the raw data for one row
def read_row( row, row_length ):
    format = 'h'  # h stands for signed short

    for i in range(0, row_length):
        offset = i * calcsize(format)
        (height,) = unpack(format, row[offset : offset+calcsize(format))
        # do something with the height

Describing it in more generic terms, basically you want to read the file in 2 bytes at a time, parse the bytes read as a 16-bit signed integer and process it. Since you know the grid size already you can read it in row by row or in any other manner that is convenient to your application. It also means you can randomly seek to specific coordinates inside the data file.

codelogic
  • 71,764
  • 9
  • 59
  • 54
  • 4
    You actually need to use '!h' format. big-endian as describe in SRTM specs – skrat Feb 06 '10 at 13:45
  • 1
    Either `'!h'` or `'>h'` should work to define big-endian: https://docs.python.org/3/library/struct.html#byte-order-size-and-alignment – Kyle Barron Mar 28 '20 at 18:43
5

If you want a little more speed than you get from millions of calls to struct.unpack, have a look at array.array. While the "struct-and-for-loop" implementation takes several seconds on my admittedly slow laptop, the following is near instantaneous:

from array import array

f = open(filename, 'rb')
format = 'h'
row_length = 1201
data = array(format)
data.fromfile(f, row_length*row_length)
data.byteswap()
f.close()
user532954
  • 446
  • 6
  • 7
1

https://gdal.org/drivers/raster/srtmhgt.html

    Input_HGT = 'N30E120.hgt'
    import gdal
    Raster = gdal.Open(Input_HGT) 

All functions available with GDAL on raster files could be applied on this 'Raster' like Functions available with the variable, 'Raster'

0

The NASA SRTM data files are in Big-Endian format, so depending on what platform you are reading the data on, you may have to do a conversion from Big-Endian to Little-Endian.

There are numerous sources on how to do this, I have no experience with Python so I can't help you there.

But if you forget this, your values are going to be all messed up.

-1

If you have photoshop you might be able to play around with the raw import to get it to read these files and save them out to something more useful. I have had some success doing this sort of thing in the past.

Dolphin
  • 4,655
  • 1
  • 30
  • 25