6

I'm trying to perform an operation on a whole column but I'm getting a type error, I want to make a column containing a Shapely Point:

crime_df = crime_df[crime_df['Latitude'].notna()]
crime_df = crime_df[crime_df['Longitude'].notna()]

crime_df['Longitude'] = crime_df['Longitude'].astype(float)
crime_df['Latitude'] = crime_df['Latitude'].astype(float)

print (crime_df['Longitude'])
print (crime_df['Latitude'])

crime_df['point'] = Point(crime_df['Longitude'], crime_df['Latitude'])

Output:

18626    -87.647379
Name: Longitude, Length: 222, dtype: float64

18626    41.781100
Name: Latitude, Length: 222, dtype: float64

TypeError: cannot convert the series to <class 'float'>
Georgy
  • 12,464
  • 7
  • 65
  • 73
TomSelleck
  • 6,706
  • 22
  • 82
  • 151

1 Answers1

8

I think you need working with each point separately, so need DataFrame.apply with lambda function:

crime_df['point'] = crime_df.apply(lambda x: Point(x['Longitude'], x['Latitude'], axis=1)

Or thanks @N. Wouda:

crime_df["point"] = crime_df[["Longitude", "Latitude"]].apply(Point, axis=1)

Or list comprehension alternative is:

crime_df['point'] = [Point(lon, lat) 
                                 for lon, lat in crime_df[['Longitude','Latitude']].values]

EDIT: I think for vectorized way is possible use geopandas.points_from_xy like:

gdf = geopandas.GeoDataFrame(df,geometry=geopandas.points_from_xy(df.Longitude,df.Latitude))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Ah - I was trying to vectorize the process to speed it up, is it not possible in this case? – TomSelleck Apr 12 '20 at 12:00
  • 1
    This can be done a little cleaner as follows: `crime_df["point"] = crime_df[["Longitude", "Latitude"]].apply(Point, axis=1)`, as `__init__` is already callable and a Shapely Point [understands sequences](https://shapely.readthedocs.io/en/latest/manual.html#points) :). – Nelewout Apr 12 '20 at 12:01
  • @TomSelleck - I think problem is `Point` is not possible create this way. Ifound another way, edited answer. – jezrael Apr 12 '20 at 12:10
  • 1
    IIUC, the last option should be faster if PyGEOS is installed. Otherwise, it is just a simple list comprehension under the hood. See [source code](https://github.com/geopandas/geopandas/blob/master/geopandas/_vectorized.py#L219-L245) and a [note on the optional PyGEOS dependency in the docs](https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency). – Georgy Apr 12 '20 at 12:20