5

I have a dataframe as follows:

id | time
1  | 10:21
1  | 10:22
1  | 10:23
2  | 10:40
2  | 10:45
2  | 10:50

I would like to add a new column as follows:

id | time  | new_time
1  | 10:21 | 10:22
1  | 10:22 | 10:23
1  | 10:23 | None
2  | 10:40 | 10:45
2  | 10:45 | 10:50
2  | 10:50 | None

That means, I would like to create the new column by matching the values in the id column. For example, if the id values for two consecutive rows are the same then I would like to add the value in the new_time (in the first column) from the time value of the second column. If the id values are not the same then I would like to add None for the new_time value.How can I achieve this using python or pandas?

Greg Schmit
  • 4,275
  • 2
  • 21
  • 36
Rafi
  • 467
  • 6
  • 17
  • Could you explain the logic behind this? – yatu Feb 25 '19 at 15:42
  • I think someone has already asked it. Lemme check! – Rafael Barros Feb 25 '19 at 15:43
  • Possible duplicate of [Adding new column to existing DataFrame in Python pandas](https://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas) – Rafael Barros Feb 25 '19 at 15:43
  • I am sorry for not explaining it clearly. I would like to create the new column by matching the values in the `id` column. For example, if the `id` values for two consecutive rows are the same then I would like to add the value in the `new_time` (in the first column) from the `time` value of the second column. If the `id` values are not the same then I would like to add `None` for the `new_time` value. – Rafi Feb 25 '19 at 15:48
  • @RafaelBarros: I am not sure if your provided link resolves my issue. I would like to use (by shifting and matching) the existing columns to calculate the new column. – Rafi Feb 25 '19 at 15:54

2 Answers2

6

Use .shift() to get next record:

df['new_time'] = df.shift(-1).time

Results:

    id  time    new_time
0   1   10:21   10:22
1   1   10:22   10:23
2   1   10:23   10:40
3   2   10:40   10:45
4   2   10:45   10:50
5   2   10:50   NaN

Then assign np.NaN to the last row for each id

df.loc[df.groupby('id', as_index= False).nth([-1]).index, 'new_time'] = np.NaN

Results:

id  time    new_time
0   1   10:21   10:22
1   1   10:22   10:23
2   1   10:23   NaN
3   2   10:40   10:45
4   2   10:45   10:50
5   2   10:50   NaN
vercelli
  • 4,717
  • 2
  • 13
  • 15
  • 2
    Thank you for your comment. It was helpful. I solved my issue in the following way: `df['new_time'] = df.groupby(['id'])['time'].shift(-1)` – Rafi Feb 25 '19 at 16:21
2

Assuming ID is currently the index, I would try:

df = df.reset_index()
shifted = df.shift(-1)
df['new_time'] = shifted.time
df.loc[df.id != shifted.id, "new_time"] = None
df = df.set_index("id")

If it's not an index you can just skip the reset_index and set_index lines.

It basically shifts the entire dataframe, matches them up together and wherever IDs are no longer the same it sets those values to None.

Pluckerpluck
  • 731
  • 6
  • 21
  • 1
    Thank you for your comment. It was helpful. I solved my issue in the following way: `df['new_time'] = df.groupby(['id'])['time'].shift(-1)` – Rafi Feb 25 '19 at 16:20
  • 2
    I did not know that pandas let you shift on a group... That is good to know – Pluckerpluck Feb 25 '19 at 16:46