9

I'm trying to build a GeoJSON object. My input is a csv with an address column, a lat column, and a lon column. I then created Shapely points out of the coordinates , buffer them out by a given radius, and get the dictionary of coordinates via the mapping option- so far, so good. Then, after referring to this question, I wrote the following function to get a Series of dictionaries:

def make_geojson(row): return {'geometry':row['geom'], 'properties':{'address':row['address']}}

and I applied it thusly:

data['new_output'] = data.apply(make_geojson, axis=1)

My resulting column is full of these: <built-in method values of dict object at 0x10...

The weirdest part is, when I directly call the function (i.e. make_geojson(data.loc[0]) I do in fact get the dictionary I'm expecting. Perhaps even weirder is that, when I call the functions I'm getting from the apply (e.g. data.output[0](), data.loc[0]['output']()) I get the equivalent of the following list: [data.loc[0]['geom'], {'address':data.loc[0]['address']}], i.e. the values (but not the keys) of the dictionary I'm trying to get.

For those of you playing along at home, here's a toy example:

from shapely.geometry import Point, mapping
import pandas as pd

def make_geojson(row):
    return {'geometry':row['geom'], 'properties':{'address':row['address']}}

data = pd.DataFrame([{'address':'BS', 'lat':34.017, 'lon':-117.959}, {'address':'BS2', 'lat':33.989, 'lon':-118.291}])
data['point'] = map(Point, zip(data['lon'], data['lat']))
data['buffer'] = data['point'].apply(lambda x: x.buffer(.1))
data['geom'] = data.buffer.apply(mapping)
data['output'] = data.apply(make_geojson, axis=1)
Nick Marinakis
  • 1,776
  • 2
  • 10
  • 12
  • Can you provide a self-contained example demonstrating the problem? – BrenBarn Jan 16 '15 at 21:36
  • 2
    As always, things are very likely to go off the rails when you start putting non-scalar elements in Series and DataFrames; this is happening because a branch is taken assuming that pandas can call `someobj.values` and get the values of an NDFrame but since you've given it a dictionary it's instead getting the dictionary method. What's your final goal? – DSM Jan 16 '15 at 21:49
  • I'm just trying to get a geojson object (or a python dictionary I can dump to geojson). That's gonna look like [this](http://geojson.org/geojson-spec.html#examples). – Nick Marinakis Jan 16 '15 at 21:51
  • @dsm I am following the example [here](http://stackoverflow.com/a/13337376/1599229), but my equivalent of `f()` is returning a `dict`. Same issue as this question. Yet it's possible to store a `dict` in a `DataFrame`. I don't know quite what you mean by "a branch is taken" -- does that mean: `apply` with a returned `dict` is not possible at all? Is there another way to operate on each row while storing the `dict` result in a new column? – scharfmn Aug 23 '15 at 10:12
  • @bahmait The issue is that the values method on the dictionary is overwriting the values method on the NDFrame. Instead of applying whatever function you're calling to a DataFrame, map it to lists of the columns you need – Nick Marinakis Sep 22 '15 at 05:24

2 Answers2

3

Thanks, DSM, for pointing that out. Lesson learned: pandas is not good for arbitrary Python objects

So this is what I wound up doing:

temp = zip(list(data.geom), list(data.address))
output = map(lambda x: {'geometry': x[0], 'properties':{'address':x[1]}}, temp)
Nick Marinakis
  • 1,776
  • 2
  • 10
  • 12
1

I got to this post because I ran into a similar issue, but when running a PySpark DataFrame instead of Pandas.

In case someone ends up here, like myself, I'll explain how I fixed it for a PySpark DataFrame.

The reason why I was getting the error (built-in method of Row object, in my case), was because my field name count was colliding with the inherited method count from python tuples (as seen here).

The solution was simply change the name of the field to something like my_count and it worked fine.

Hannon Queiroz
  • 443
  • 4
  • 22