1

I have a question.

I have a table like this

TAC | Latitude | Longitude
1 | 50.4 | -1.5 

In Pandas, I wanted to say:

For each TAC, give me a zipped list of latitude and longitude (each TAC can have many rows).

I've tried things like the below, but I am doing something wrong! Can you help?

df1['coordinates'] = list(zip(df1.Lat, df1.Long))
new_df = df1.iloc[ : , : ].groupby('TAC').agg(df1['coordinates'])

For reference, DF1 is created as below

df = pd.read_csv('tacs.csv')
df1 = df[['magnet.tac','magnet.latitude', 'magnet.longitude']]
df1.columns = ['TAC','Lat','Long']
jpp
  • 159,742
  • 34
  • 281
  • 339
kikee1222
  • 1,866
  • 2
  • 23
  • 46

1 Answers1

2

First add usecols parameter for avoid SettingWithCopyWarning and then use GroupBy.apply with lambda function:

df = pd.read_csv('tacs.csv', usecols=['magnet.tac','magnet.latitude', 'magnet.longitude'])
df1.columns = ['TAC','Lat','Long']

#sample data
print (df1)
   TAC   Lat  Long
0    1  50.4  -1.5
1    1  50.1  -1.4
2    2  50.2  -1.8
3    2  50.9  -1.3

new_df = df1.groupby('TAC').apply(lambda x: list(zip(x.Lat, x.Long))).reset_index(name='coord')
print (new_df)
   TAC                         coord
0    1  [(50.4, -1.5), (50.1, -1.4)]
1    2  [(50.2, -1.8), (50.9, -1.3)]

Your solution should be changed:

df = pd.read_csv('tacs.csv')
df1 = df[['magnet.tac','magnet.latitude', 'magnet.longitude']].copy()
df1.columns = ['TAC','Lat','Long']

df1['coordinates'] = list(zip(df1.Lat, df1.Long))
new_df = df1.groupby('TAC')['coordinates'].agg(list).reset_index()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for this. IN both cases I get: /Users/keenek1/anaconda3/lib/python2.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – kikee1222 Oct 10 '19 at 13:18
  • 1
    @kikee1222 - Problem is in code before, how is created `df1` ? – jezrael Oct 10 '19 at 13:18
  • @kikee1222 - In you r code need `.copy()` like `df1 = df[['magnet.tac','magnet.latitude', 'magnet.longitude']].copy()`, but better is use `usecols` here – jezrael Oct 10 '19 at 13:21
  • Awesome thanks that worked. Why did it require.copy()? – kikee1222 Oct 10 '19 at 13:22
  • @kikee1222 - Because it modified copy of data, check [this](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – jezrael Oct 10 '19 at 13:23