1

I am loading a shapefile to GeoDataFrame using GeoPandas method read_file. I need to apply some replacement modifications on a column with geometry data. To do this I am casting this column as string. Without casting executing .replace is causing an error TypeError: expected string or bytes-like object. However, this operation leads to trimming of original data in the geometry column. Below is an example for differences in one cell:

Column GEOMETRY from Shapefile loaded to GeoDataFrame:
LINESTRING (13.90327032848085764 46.61940531353186401, 13.90327032848085587 46.61940531353186401)

Column GEOMETRY from GeoDataFrame converted to string:
LINESTRING (13.90327032848086 46.61940531353186, 13.90327032848086 46.61940531353186)

And my code to convert geometry type to string type is:

geodataframe['geometry'] = geodataframe.geometry.astype(str)

In geometry column I can have lines and multilines with a variable number of XY pairs. Above was just a simple example. Does anybody know how to convert it without unwanted rounding?

Georgy
  • 12,464
  • 7
  • 65
  • 73
zwornik
  • 329
  • 7
  • 15
  • What are the versions of Python and GeoPandas you are using? – Zeeshan Nov 26 '19 at 20:50
  • Python: 3.7.5. Pandas: 0.25.3. Geopandas: 0.6.1. I am running it on Anaconda. – zwornik Nov 26 '19 at 20:56
  • Please try these: geodataframe['geometry'] = geodataframe.geometry.apply(str) or geodataframe['geometry'] = geodataframe.geometry.astype(basestring) – Harsha Nov 26 '19 at 21:02
  • @Harsha apply(str) did not help. Second option is not accepted (data type 'basestring' not understood) – zwornik Nov 26 '19 at 21:07
  • @zwornik thank you. The second option was for python 2.7, sorry. – Harsha Nov 26 '19 at 21:11
  • @zwornik could you please try the statement. geodataframe['geometry'] = geodataframe.geometry.astype('float64') – Harsha Nov 26 '19 at 21:15
  • @Harsha TypeError: float() argument must be a string or a number, not 'LineString'. – zwornik Nov 26 '19 at 21:27
  • @zwornik I think I figured out the issue. Pandas is unable to convert multiple 'objects' in linestring into 1 string.You need to either 1) create 4 new columns to hold the 4 different coordinates or 2) merge all 4 coordinates (in a separate function) as a str object and add to the geometry column. I would recommend the first option since it offers more flexibility. – Harsha Nov 26 '19 at 21:40
  • @Harsha This will not work in my case. In Geometry column I can have Lines and Multilines with variable number of XY pairs. Above was just simple example. So I cannot have fixed number of new columns. – zwornik Nov 26 '19 at 21:46
  • @zwornik understandable. I do not have a solution for this. I will be closely following this question! – Harsha Nov 26 '19 at 21:48

2 Answers2

3

If you want string representation of your geometry you should use WKT. Conversion of shapely geometries to string would not work using astype.

Using GeoPandas 0.9+:

geodataframe['wkt'] = geodataframe.geometry.to_wkt()

Using older versions:

geodataframe['wkt'] = geodataframe.geometry.apply(lambda g: g.wkt)

This will give you new columns of string (WKT) representation of your geometries. What you see normally in you geometry column is just a representation of shapely geometry.

martinfleis
  • 7,124
  • 2
  • 22
  • 30
1

IIUC, you won't be able to have more than 16 decimal digits. Using str(geometry) or geometry.wkt (as proposed in another answer, which in fact are the same thing) will always trim the result to the total of 16 digits:

>>> from shapely.geometry import Point
>>> point = Point(0, 1234567890.1234567890123456789)
>>> point.wkt
'POINT (0 1234567890.123457)'
>>> str(point)
'POINT (0 1234567890.123457)'

You could use shapely.wkt.dumps to always get 16 decimal digits irregardles of the total number of digits:

>>> from shapely.wkt import dumps
>>> dumps(point)
'POINT (0.0000000000000000 1234567890.1234567165374756)'

but, as you can see, it still loses some data at the end.

So, the only thing you can do is to accept the fact that you will be losing some data, and deal with it properly later, as, for example, here: How to deal with rounding errors in Shapely.


In your case when you simply want to discard this kind of "faulty" lines that due to precision shrink to zero, you could use is_valid:

>>> from shapely.wkt import loads
>>> line = loads('LINESTRING (13.90327032848086 46.61940531353186, 13.90327032848086 46.61940531353186)')
>>> line.is_valid
False
Georgy
  • 12,464
  • 7
  • 65
  • 73
  • Rounding leads to an issue (like in above example I gave) where very short line is represented by two exactly the same XY (kind of point not line). I will need to then clean up such faulty lines. Do you possibly know how this can be done on operating on Geometry type not casting it to string/list and do some further cleanup? – zwornik Nov 27 '19 at 15:36
  • My Geometry column is of type MULTILINESTRING. Oryginally was LINESTRING and MULTILINESTRING, but that mixture was not accepted when loading data to PostGIS. So there are cases inside MULTILINESTRING when on Line can be valid and other not. I have skipped "wkt.loads" step because with ".geometry.apply(wkt.loads)" I got AttributeError: 'MultiLineString' object has no attribute 'encode'. With "wkt.loads(geodataframe.geometry)" I got AttributeError: 'GeoSeries' object has no attribute 'encode'. I tried this: [x if x.is_valid else np.nan for x in geodataframe['geometry']]. Still invalid line. – zwornik Nov 27 '19 at 17:29
  • @zwornik Can you try `geodataframe = geodataframe[geodataframe.geometry.apply(lambda g: g.is_valid)]`? – Georgy Nov 27 '19 at 19:02
  • I have tried with apply lambda within list, but it set values in Geometry column equal to df Index or GID column. – zwornik Nov 27 '19 at 19:18
  • I wonder if there is some sort of validate/celanup action possible on PostGIS (after uploading GeoDataFrame) to remove such zero-length lines. Or apply rounding on GDF and then perform "is_valid" action. – zwornik Nov 27 '19 at 19:23
  • After "is_valid" applied I see this (print(df.loc[[27]])) in GDF :(13.90327 46.61941, 13.90327 46.61941). I have then dumped GDF to SHP. In SHP is OK: (13.90327032848085764 46.61940531353186401, 13.90327032848085587 46.61940531353186401). In PostGIS not: (13.90327032848085942 46.6194053135318569, 13.90327032848085942 46.6194053135318569) – zwornik Nov 27 '19 at 19:47
  • Could you update the question with some example data and the code that you are trying so that me or someone else could reproduce the issue? – Georgy Nov 27 '19 at 21:36
  • https://gis.stackexchange.com/questions/219836/removing-unwanted-linestrings-from-multilinestring-in-postgis - exactly my problem and solution for it :) Though it is cleanup after loading to PostGIS and I prefer to do it before. But it seems that error caused by rounding happens during exporting GDF to PostGIS. – zwornik Nov 27 '19 at 22:03