Pandas: apply a specific function to columns and create other columns

Question

I have a pandas dataframe df with geographical coordinates like this:

    lat         lon         
0   48.01025772 -6.15690851 
1   48.02164841 -6.10588741 
2   48.03302765 -6.05480051 
... ...         ...

I need to convert these coordinates into a different system, and have a dedicated function for this. I plan to create two new columns, df['N'] which is paired with lat, and df['E'] which is paired with lon.

It's not relevant what the function looks like, so for simplicity let's call it f. The function operates like this: E, N = f(float(lat), float(lon))

Is there a way I can iterate through all rows of df, extract the lat,lon pair, (compute their transformation) and assign the values to the relevant columns?

if you're interested I wrote a method for calculating haversine in a vectorised manner, this will give you some idea of how you maybe able to rewrite whatever your function does: http://stackoverflow.com/questions/25767596/using-haversine-formula-with-data-stored-in-a-pandas-dataframe — EdChum, Feb 02 '17 at 16:55

EdChum · Accepted Answer · 2017-02-02T16:48:33.027

You can use apply on the df and pass axis=1, in your function you should return a Series and assign the 2 columns directly:

In [207]:
def foo(lat, lon):
    return pd.Series([lat + 10, lon * 100])

df[['new_lat','new_lon']] = df.apply(lambda x: foo(x['lat'], x['lon']), axis=1)
df

Out[207]:
         lat       lon    new_lat   new_lon
0  48.010258 -6.156909  58.010258 -615.6909
1  48.021648 -6.105887  58.021648 -610.5887
2  48.033028 -6.054801  58.033028 -605.4801

depending on what your function is doing using apply can and should be avoided

vozman · Answer 2 · 2019-02-05T11:47:20.210

You can avoid creating pd.Series as it takes a lot of time and pass result_type='expand' argument instead. This runs noticeably faster on big Dataframes

def foo(lat, lon):
    return [lat + 10, lon * 100]

df[['new_lat','new_lon']] = df.apply(lambda x: foo(x['lat'], x['lon']), axis=1, result_type='expand')

On my Dataframe the following timing was achieved: Pure apply(no assigning, returns list) - 27 sec, with result_type='expand' - 30 sec, with return pd.Series(...) - 41 sec.

Jan Zeiseweis · Answer 3 · 2017-02-02T16:44:16.553

2

You can use:

df[['lat', 'lon']].apply(lambda row: f(float(row['lat']), float(row['lon']), axis=1)

edited Feb 02 '17 at 16:44

answered Feb 02 '17 at 16:37

Jan Zeiseweis

3,718
2
17
24

score 1 · Answer 4 · answered Sep 23 '19 at 00:55

1

If you don't always know the number, name or order of columns returned, this solution is more flexible:

    exploded = df.apply(lambda x: ..., axis='columns', result_type='expand')

    return pd.concat([df, exploded], axis='columns', sort=False)

answered Sep 23 '19 at 00:55

eddygeek

4,236
3
25
32

Pandas: apply a specific function to columns and create other columns

4 Answers4