14

I'm trying to read an OGR vector map using the readOGR function provided by the rgdal package, but I'm having a little trouble with the German umlauts. I've provided a little example of what the data looks like, umlauts like ö are replaced with \303\266.

map <-readOGR("/path/to/data.gdb", layer = "layer")
map@data$name
# [1] L\303\266rrach
# [2] Karlsruhe
# [3] B\303\266blingen
# [4] ...

I've tried to specify an encoding in the readOGR function (readOGR(dsn = "/path/to/data.gdb", layer = "layer", encoding = "UTF-8" or readOGR(dsn = "/path/to/data.gdb", layer = "layer", encoding = "LATIN-1"), but it looks like it is ignoring the encoding parameter completely, since I'm getting the same result for each encoding I've tried. Does anybody know how I can get the readOGR function or R to display the correct German umlauts?

  • Are you on a Windows machine? –  Jan 20 '16 at 08:31
  • I'm running RStudio Server on a Debian 8 Server. –  Jan 20 '16 at 08:40
  • Very strange. I'm reading a geojson/shp file with encoding utf8 and the display in R is messed up. If I read the same data through a csv file using utf8 encoding (both exported from QGIS), everything is fine and dandy. Someone hand me a gun. – Roman Luštrik Feb 09 '16 at 09:16

3 Answers3

14

Julian is right.

file_name <- "../gis_data/bw/AX_KommunalesGebiet.shp"
shape_kommunal <- readOGR(file_name, layer = "AX_KommunalesGebiet", use_iconv = TRUE, encoding = "UTF-8")
data_kommunal <- shape_kommunal@data
head(data_kommunal)

returns the string correctly:

  GKZ                NAME
0 08236074           Kämpfelbach
1 08425052           Grundsheim
2 08435067           Deggenhausertal
Michael Sebald
  • 196
  • 1
  • 8
6

The encoding-parameter is ignored, if iconv is not set to TRUE.

Funkwecker
  • 766
  • 13
  • 22
0

I'm not quite sure what encoding = "UTF-8/LATIN-1/..." might do. I would have expected that you would choose one and only one encoding scheme. On my machine I do see the translation of that octal character to the o-umlaut:

> 'B\303\266blingen'
[1] "Böblingen"
> 'L\303\266rrach'
[1] "Lörrach"

To see the various conventions for R characters, type:

?Quotes

Besides encodings, there is also the need to have characters in the font being used. The font used in your console display doesn't seem to have the proper mappings for o-umlaut. My default font is Courier. You should also check your locale settings. ?Sys.getlocale

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I wanted to indicate that I've used different encodings while reading the OGR vector map. I've read it once with UTF-8, once with LATIN-1, etc. As for your answer, that is kinda strange, I get different results. `'B\303\266blingen' # "B\303\266blingen"` –  Jan 20 '16 at 08:51
  • That was the problem. Sys.getlocale() gave the info that currently "C" was used, I've changed it with Sys.setlocale() to a locale which supports UTF-8 and now `'L\303\266rrach'` is correctly displayed as "Lörrach". –  Jan 20 '16 at 09:43