0

in a dataframe called 'single_future_data' I want to create a new column 'Ctr1' based on other columns of the same df.

I came up with this solution, which is not the most elegant and fastest one, but does what it's supposed to:

single_future_data['Ctr1'] = np.where(np.logical_and(single_future_data.Price_Prev3.notnull(), single_future_data.Price_Prev3.shift(1).notnull()), 'Price_Prev3',
              np.where(np.logical_and(single_future_data.Price_Prev2.notnull(), single_future_data.Price_Prev2.shift(1).notnull()), 'Price_Prev2',
              np.where(np.logical_and(single_future_data.Price_Prev1.notnull(), single_future_data.Price_Prev1.shift(1).notnull()), 'Price_Prev1',
              np.where(np.logical_and(single_future_data.Price_C_1.notnull(), single_future_data.Price_C_1.shift(1).notnull()), 'Price_C_1',
              np.where(np.logical_and(single_future_data.Price_C_2.notnull(), single_future_data.Price_C_2.shift(1).notnull()), 'Price_C_2',
              np.where(np.logical_and(single_future_data.Price_C_3.notnull(), single_future_data.Price_C_3.shift(1).notnull()), 'Price_C_3', 'Price_C_4'))))))

My problem is that this function doesn't work on the first row, on which I need to apply a different function.

I tried this code to apply my function starting from the second row:

single_future_data['Ctr1'] =""

single_future_data.iloc [1:, 8] = np.where(np.logical_and(single_future_data.Price_Prev3.notnull(), single_future_data.Price_Prev3.shift(1).notnull()), 'Price_Prev3',
              np.where(np.logical_and(single_future_data.Price_Prev2.notnull(), single_future_data.Price_Prev2.shift(1).notnull()), 'Price_Prev2',
              np.where(np.logical_and(single_future_data.Price_Prev1.notnull(), single_future_data.Price_Prev1.shift(1).notnull()), 'Price_Prev1',
              np.where(np.logical_and(single_future_data.Price_C_1.notnull(), single_future_data.Price_C_1.shift(1).notnull()), 'Price_C_1',
              np.where(np.logical_and(single_future_data.Price_C_2.notnull(), single_future_data.Price_C_2.shift(1).notnull()), 'Price_C_2',
              np.where(np.logical_and(single_future_data.Price_C_3.notnull(), single_future_data.Price_C_3.shift(1).notnull()), 'Price_C_3', 'Price_C_4'))))))

but the following error came out: ValueError: could not broadcast input array from shape (79,) into shape (78,)

Any ideas? Thanks

younggotti
  • 762
  • 2
  • 15
  • 1
    Can you give us an example to help us reproduce your dataframe? It's difficult to answer pandas questions in the abstract as a lot of operations depend on knowing the schema of the data. Take a look at this guide: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Steele Farnsworth Jul 29 '21 at 14:55
  • 4
    first as you have several conditions, have a look at `np.select` instead of `np.where`, would be a bit easier to read. and for your pb, you can use index slicing to keep only the same number of elements, so `single_future_data.iloc [1:, 8] = np.where( ... )))[1:]` – Ben.T Jul 29 '21 at 14:58

1 Answers1

-1

Ben.T's solution in the comments worked fine.

younggotti
  • 762
  • 2
  • 15