0

I am a newbie in python and i want to perform a sort of shifting based on a shift unit that i have in a column. My data is as the following :

Group Rate

1 0.1
1 0.2
1 0.3
2 0.9
2 0.12

The shifting_Unit of the first group is 2 and for the second 1

The desired output is the following :

Group Shifted_Rate

1 0
1 0
1 0.1
2 0
2 0.9

I tried to do the following but it is not working : df['Shifted_Rate'] = df['Rate'].shift(df['Shift_Unit'])

Is there another way to do it without the shift() method ?

Crypt_Fomo
  • 11
  • 2

1 Answers1

0

I think this might be the first time I've worked with pandas, so this might not be helpful, but from what I've found in the documentation for pandas.DataFrame.shift(), it looks like the periods variable that relates to the "number of periods to shift" is an int. Because of this (that is, because this is an int rather than something like a list or dict), I have the feeling that you might need to approach this type of problem by making individual data frames and then putting these data frames together. I tried this out and used pandas.DataFrame.append() to put the individual data frames together. There might be a more efficient way to do this with pandas, but for now, I hope this helps with your immediate situation.

Here is the code that I used to do approach your situation (this code is in a file called q11.py in my case):

import numpy as np
import pandas as pd

# The periods used for the shifting of each group
# (e.g., '1' is for group 1, '2' is for group 2).
# You can add more items here later if need be.
periods = {
    '1': 2,
    '2': 1
}

# Building the first DataFrame
df1 = pd.DataFrame({
    'Rate': pd.Series([0.1, 0.2, 0.3], index=[1, 1, 1]),
})

# Building the second DataFrame
df2 = pd.DataFrame({
    'Rate': pd.Series([0.9, 0.12], index=[2, 2]),
})

# Shift
df1['Shifted_Rate'] = df1['Rate'].shift(
    periods=periods['1'],
    fill_value=0
)

df2['Shifted_Rate'] = df2['Rate'].shift(
    periods=periods['2'],
    fill_value=0
)

# Append the df2 DataFrame to df1 and save the result to a new DataFrame df3
# ref: https://pythonexamples.org/pandas-append-dataframe/
# ref: https://stackoverflow.com/a/51953935/1167750
# ref: https://stackoverflow.com/a/40014731/1167750
# ref: https://pandas.pydata.org/pandas-docs/stable/reference/api
#   /pandas.DataFrame.append.html
df3 = df1.append(df2, ignore_index=False)
# ref: https://stackoverflow.com/a/18023468/1167750
df3.index.name = 'Group'

print("\n", df3, "\n")

# Optional: If you only want to keep the Shifted_Rate column:
del df3['Rate']
print(df3)

When running the program, the output should look like this:

$ python3 q11.py

        Rate  Shifted_Rate
Group
1      0.10           0.0
1      0.20           0.0
1      0.30           0.1
2      0.90           0.0
2      0.12           0.9

       Shifted_Rate
Group
1               0.0
1               0.0
1               0.1
2               0.0
2               0.9
summea
  • 7,390
  • 4
  • 32
  • 48
  • Thanks a lot summea, good idea except that I have a lot of groups and I am afraid this approach will have performance issues :( – Crypt_Fomo Feb 19 '21 at 11:04
  • Hi @Crypt_Fomo, You are welcome! There might be performance issues with this approach, but it might be worth trying out the code (or something similar) to see if the performance issues happen or not. :) With the concern about performance issues, I'm not sure if you mean that the code would take up a lot of memory, or take a lot of time to process, or that something else would happen (performance related) with this approach? – summea Feb 20 '21 at 21:07