3

Below I have a DataFrame showing how each vehicle (Vehicle_ID) will visit the different places (place_id) in a particular sequence (Pos_in_route).

place_id    Lat      Lon      Vehicle_ID    Pos_in_route
    0   51.4457678  -0.45613      0           0
    1   52.497911   -1.903832     0           1
    2   52.332395   -1.346753     0           2
    0   51.4457678  -0.45613      0           3
    0   51.4457678  -0.45613      1           0
    4   52.110728   -0.463547     1           1
    3   52.276323   -1.579845     1           2
    5   52.423667   -0.609697     1           3
    0   51.4457678  -0.45613      1           4

Now I am trying to add another two columns to show the GPS coordinates of the prior place visited (prior_lat, prior_lon), according to the column (Pos_in_route). If there is no prior position of place it will be itself (i.e. place_id = 0)

    place_id    Lat      Lon      Vehicle_ID    Pos_in_route   prior_lat    prior_lon
        0   51.4457678  -0.45613    0           0              51.4457678   -0.45613
        1   52.497911   -1.903832   0           1              51.4457678   -0.45613
        2   52.332395   -1.346753   0           2              52.497911    -1.903832
        0   51.4457678  -0.45613    0           3              52.332395    -1.346753
        0   51.4457678  -0.45613    1           0              51.4457678   -0.45613
        4   52.110728   -0.463547   1           1              51.4457678   -0.45613
        3   52.276323   -1.579845   1           2              52.110728    -0.463547
        5   52.423667   -0.609697   1           3              52.276323    -1.579845
        0   51.4457678  -0.45613    1           4              52.423667    -0.609697

This is a small part of my program but it's very hard for me to show the complete codes. Maybe there is a simple way to add the last two columns at the end of my program?

Jack
  • 1,339
  • 1
  • 12
  • 31
  • Try to improve your question by following this and making a runnable example: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Little Bobby Tables Aug 06 '18 at 10:54

2 Answers2

5

You can use GroupBy + shift and then bfill:

g = df.groupby('Vehicle_ID')
df[['Prior_Lat', 'Prior_Lon']] = g[['Lat', 'Lon']].shift().bfill()

print(df[['Prior_Lat', 'Prior_Lon']])

   Prior_Lat  Prior_Lon
0  51.445768  -0.456130
1  51.445768  -0.456130
2  52.497911  -1.903832
3  52.332395  -1.346753
4  51.445768  -0.456130
5  51.445768  -0.456130
6  52.110728  -0.463547
7  52.276323  -1.579845
8  52.423667  -0.609697

Alternatively, you can use fillna instead of bfill:

df[['Prior_Lat', 'Prior_Lon']] = g[['Lat', 'Lon']].shift().fillna(df[['Lat', 'Lon']])
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Use merge_asof as follows,

df = pd.DataFrame(
    {
        'Vechile_ID': [0] * 4 + [1] * 4,
        'Pos_in_route': list(range(0, 4)) * 2,
        'lat': [1, 3, 1, 2, 4, 2, 1, 2]
    }
)

df = df.sort_values('Pos_in_route')
df = pd.merge_asof(
    df, df, by='Vechile_ID', on='Pos_in_route',
    allow_exact_matches=False, suffixes=('', '_prior')
)

df = df.sort_values(['Vechile_ID', 'Pos_in_route']).bfill()

This is joining the whole dataframe back onto itself, but so that equal instances of the Pos_in_route column as not joined, the next available one is joined.

To finish this off you will need to drop the unwanted prior columns.

Little Bobby Tables
  • 4,466
  • 4
  • 29
  • 46