You can use the string accessor str[]
to get the values of nested tuples to set up the 4 columns, as follows:
df['to_longitude'] = df['Coordinates'].str[0].str[0]
df['to_latitude'] = df['Coordinates'].str[0].str[1]
df['from_longitude'] = df['Coordinates'].str[1].str[0]
df['from_latitude'] = df['Coordinates'].str[1].str[1]
Demo
data = {'Link': ['abd', 'abe'],
'Coordinates': [((-4.21, 55.85), (-4.22, 55.86)), ((-4.25, 55.82), (-4.26, 55.83))]}
df = pd.DataFrame(data)
Link Coordinates
0 abd ((-4.21, 55.85), (-4.22, 55.86))
1 abe ((-4.25, 55.82), (-4.26, 55.83))
df['to_longitude'] = df['Coordinates'].str[0].str[0]
df['to_latitude'] = df['Coordinates'].str[0].str[1]
df['from_longitude'] = df['Coordinates'].str[1].str[0]
df['from_latitude'] = df['Coordinates'].str[1].str[1]
Link Coordinates to_longitude to_latitude from_longitude from_latitude
0 abd ((-4.21, 55.85), (-4.22, 55.86)) -4.21 55.85 -4.22 55.86
1 abe ((-4.25, 55.82), (-4.26, 55.83)) -4.25 55.82 -4.26 55.83
Execution time comparison:
Test data of 40,000 rows
df2 = pd.concat([df] * 20000, ignore_index=True)
Solution 1: Tom Ron's solution
def my_func(record):
record['to_longitude']=record['Coordinates'][0][0]
record['to_latitude']=record['Coordinates'][0][1]
record['from_longitude']=record['Coordinates'][1][0]
record['from_latitude']=record['Coordinates'][1][1]
return record
%timeit new_df = df2.apply(my_func, axis=1)
Result:
1min 16s ± 2.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solution 2: SeaBean's solution
%%timeit
df2['to_longitude'] = df2['Coordinates'].str[0].str[0]
df2['to_latitude'] = df2['Coordinates'].str[0].str[1]
df2['from_longitude'] = df2['Coordinates'].str[1].str[0]
df2['from_latitude'] = df2['Coordinates'].str[1].str[1]
Result:
165 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Solution 3: Anurag Dabas' solution
%%timeit
cols=['to_longitude','to_latitude','from_longitude','from_latitude']
out=pd.DataFrame(np.hstack(df2['Coordinates'].values),columns=cols)
#OR
#out=pd.DataFrame(np.concatenate(df['Coordinates'].values,axis=1),columns=cols)
Result:
Can't get the benchmarking since got error for both options:
ValueError: Shape of passed values is (2, 80000), indices imply (2, 4)
Summary
Solution 1: Tom Ron's solution
1min 16s ± 2.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Solution 2: SeaBean's solution
165 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Solution 3: Anurag Dabas' solution
Can't get the benchmarking since got error for large dataset both options
For the first 2 solutions with benchmarking results, SeaBean's solution is 460x times faster than Tom Ron's solution (165ms vs 1min 16s) for 40,000 rows of data.
The faster execution time is contributed by all vectorized Pandas operations (optimized with fast C or CPython codes) used in codes, instead of the slow apply()
on axis=1
which under the hood is slow Python looping.