0

After Year row need a new row as Year period if column 1 is year and column3< 2010 then columns values for year period is Below 2010 same as other rows

  Column1  Column2 ColumnX  Column3
0    Year        1       A     2009
1    Date        1       A       12
2    Year        2       A     2021
3    Year        3       A     2011
       Column1  Column2 ColumnX             Column3
0         Year        1       A                2009
1  Year period        1       A           Below2010
2         Date        1       A                  12
3         Year        2       A                2021
4  Year period        2       A           Above2020
5         Year        3       A                2011
6  Year period        3       A  Range in 2010/2020
Rabinzel
  • 7,757
  • 3
  • 10
  • 30
Eswar
  • 9
  • 3
  • 1
    So what ? Explanations are quite unclear. What have you tried so far ? Post some code. – 0x0fba Nov 24 '22 at 08:01
  • def get_period(row): if row["Column1"] == "Year": row["Column1"] = "Year period" if row["Column3"] < 2010: row["Column3"] = "Below2010" elif row["Column3"] > 2020: row["Column3"] = "Above2020" else: row["Column3"] = "Range in 2010/2020" return row s = df.apply(get_period, axis=1).dropna() for i in s.index: upper = df.loc[:i, :] lower = df.loc[i+1:, :] df = pd.concat([upper, s.loc[i:i], lower], axis=0) df = df.reset_index(drop=True) df – Mordor1110 Nov 24 '22 at 08:39

1 Answers1

0

Filter rows first in boolean indexing for Year columns, replace Column3 in numpy.select and add substring to Column1, last join with original by concat and sort indices by DataFrame.sort_index:

#necessary default RangeIndex
df = df.reset_index(drop=True)

df2 = df[df['Column1'].eq('Year')].copy()
df2['Column3'] = pd.to_numeric(df2['Column3'], errors='coerce')

df1 = (df2.assign(Column3 = lambda x: np.select([x['Column3']<2010, x['Column3']>2020], 
                                                  ['Below2010','Above2020'], 
                                                  default='Range in 2010/2020'),
                    Column1 =  lambda x: x['Column1'] + ' period'))

df = pd.concat([df, df1]).sort_index(kind='mergesort', ignore_index=True)
print (df)
       Column1  Column2 ColumnX             Column3
0         Year        1       A                2009
1  Year period        1       A           Below2010
2         Date        1       A                  12
3         Year        2       A                2021
4  Year period        2       A           Above2020
5         Year        3       A                2011
6  Year period        3       A  Range in 2010/2020
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252