2

I have a dataframe like this,

>>> import pandas as pd
>>> data = {
    'code': ['123456', '123456', '123456', '123abc', '123abc', '123abc'],
    'price': [112.58, 112.58, 113.25, 120.31, 120.36, 140.12],
}... ... ...
>>> all_df = pd.DataFrame(data)
>>> all_df
     code   price
0  123456  112.58
1  123456  112.58
2  123456  113.25
3  123abc  120.31
4  123abc  120.36
5  123abc  140.12

I want to create a column named period and increase the period when the price is not the same value. And each transaction should be made according to the unique code. Here is expected output,

     code   price  period
0  123456  112.58     1.0
1  123456  112.58     1.0
2  123456  113.25     2.0
3  123abc  112.58     1.0
4  123abc  112.58     1.0
5  123abc  113.25     2.0

I managed to do it this way, but it takes too long when the data is too much.

def get_period(df):
    period = df.loc[0, 'period'] = 1
    df = df.reset_index(drop=True)

    for i in range(1, len(df)):
        if df.loc[i, 'price'] != df.loc[i - 1, 'price']:
            df.loc[i, 'period'] = period + 1
            period += 1
        else:
            df.loc[i, 'period'] = period

    return df

all_df.groupby('code').apply(get_period).reset_index(drop=True)

Any idea? Thanks in advance.

E. Zeytinci
  • 2,642
  • 1
  • 20
  • 37
  • This doesn't give me the result I expected. Can you check it, please? – E. Zeytinci Dec 06 '19 at 09:43
  • Why did the price change in your second frame? Is this a typo or just me who dont understand correctly your problem? If you want auto increase by one based on price and just groupby "price", i.e. delete "code" from jezraels comment – SMS Dec 06 '19 at 09:49
  • I did not undestand what exactly you want. Can you be more clear in your logic ? – Rafael Ferreira Dec 06 '19 at 09:53
  • I don't want the period to change if the price is the same for the same code. When you switch to another code, period should start 1 and period 1 should be increased when a different price arrives. – E. Zeytinci Dec 06 '19 at 10:24
  • @jazrael here is outputs: https://imgur.com/a/0A6scmP https://imgur.com/a/IuKokXf The result is not what I expected. – E. Zeytinci Dec 06 '19 at 10:31
  • I wrote "when the price is the same" instead of "when the price is not the same". So sorry. I edited my question. – E. Zeytinci Dec 06 '19 at 10:38
  • Do you want something like that `x=list(frame.price.unique()) Liste=[] for items in frame.price.values: Liste.append(x.index(items)) frame["new col"]=Liste` – SMS Dec 06 '19 at 12:12

1 Answers1

3

First are tested consecutive price values by Series.shift with Series.cumsum and then for each value call factorize per groups:

df['period'] = (df.assign(new = df['price'].ne(df['price'].shift()).cumsum())
                  .groupby('code')['new']
                  .transform(lambda x: pd.factorize(x)[0]) + 1)
print (df)
     code   price  period
0  123456  112.58       1
1  123456  112.58       1
2  123456  113.25       2
3  123abc  112.58       1
4  123abc  112.58       1
5  123abc  113.25       2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252