2

I'm trying to download the following zip from the Internet and extracting a .shp file from it: https://www.wpc.ncep.noaa.gov/archives/ero/20181227/shp_94e_2018122701.zip

And inside the .zip file, I'm trying to extract the .shp file: 94e2701.shp

(I want to do this in R so I can automate this process to download .shp files for several dates).

Based on some reading, here is the code I've tried:

shp_url = "https://www.wpc.ncep.noaa.gov/archives/ero/20181227/shp_94e_2018122701.zip"
tmp = tempfile()

download.file(shp_url,tmp,mode="wb")
# I have also tried without the "mode" argument but have gotten the same result

f_name = "94e2701.shp"
data <- sf::st_read(unz(tmp,f_name))
# Error: Cannot open "3"; The file doesn't seem to exist.
unlink(tmp)

When I go to the location of the temp file, I see it's this: "file1b9026cd6821", but it's not a .zip, so I can't extract anything from it/go inside it.

What am I doing wrong here? Any help or guidance is much appreciated! Thanks!

Axeman
  • 32,068
  • 8
  • 81
  • 94
James S
  • 25
  • 2
  • Similar to this question, but a more thorough answer below: https://stackoverflow.com/a/61282885/12400385 – nniloc Apr 04 '23 at 18:07

1 Answers1

5

BLUF: you need more than just the .shp file. Unzip more (all) of the files and you'll get differing results.

For each below, I'm using unzip on the command line to unzip only the files in the step. In between, I remove files not being tested. I do not believe there is a way in R to unz(..) in order to get to all files.

  1. Just 94e2701.shp: error

  2. .shp and .prj: error

  3. .shp and .dbf file: error

  4. .shp and .shx: partial success, does not fill CRS

    data <- sf::st_read("94e2701.shp")
    # Reading layer `94e2701' from data source `C:\Users\r2\AppData\Local\Temp\Rtmpqoj4GE\94e2701.shp' using driver `ESRI Shapefile'
    # Simple feature collection with 2 features and 0 fields
    # Geometry type: POLYGON
    # Dimension:     XY
    # Bounding box:  xmin: -100.4 ymin: 28.27 xmax: -91.4 ymax: 39.77
    # CRS:           NA
    
  5. .shp, .shx, and .dbf: same as 4, no CRS

  6. .shp, .shx, .dbf, and .prj: success

    data <- sf::st_read("94e2701.shp")
    # Reading layer `94e2701' from data source `C:\Users\r2\AppData\Local\Temp\Rtmpqoj4GE\94e2701.shp' using driver `ESRI Shapefile'
    # Simple feature collection with 2 features and 7 fields
    # Geometry type: POLYGON
    # Dimension:     XY
    # Bounding box:  xmin: -100.4 ymin: 28.27 xmax: -91.4 ymax: 39.77
    # Geodetic CRS:  GCS_Sphere_EMEP
    

Incidentally, the thing that made me check this is one paragraph in ?sf::st_read:

Note that stray files in data source directories (such as *.dbf) may lead to spurious errors that accompanying '*.shp' are missing.

This made me wonder if the presence of other files in the directory were causing your problem.

I do not believe there is a way in R to unz(..) in order to get to all files. If you don't want to unzip them into your current directory (just "look" at the files and discard later), then you can create a temp directory, unzip into that, and open the file from there.

dir.create(td <- tempfile())
unzip(tmp, exdir = td)
data <- sf::st_read(file.path(td, f_name))
# Reading layer `94e2701' from data source `C:\Users\r2\AppData\Local\Temp\Rtmpqoj4GE\file185581a1d4d01\94e2701.shp' using driver `ESRI Shapefile'
# Simple feature collection with 2 features and 7 fields
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: -100.4 ymin: 28.27 xmax: -91.4 ymax: 39.77
# Geodetic CRS:  GCS_Sphere_EMEP
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • @ r2evans: great answer! I am working on a similar problem with shapefile in R - can you please take a look at it if you have time? https://stackoverflow.com/questions/75923204/calculating-distances-between-points-on-a-shapefile thank you so much! – stats_noob Apr 04 '23 at 19:09
  • 1
    Awesome, thank you so much! Appreciate the thoroughness and detail of your answer. – James S Apr 04 '23 at 20:21