1

I'd like to create an emtpy column in an existing DataFrame with the first value in only one column to = 100. After that I'd like to iterate and fill the rest of the column with a formula, like row[C][t-1] * (1 + row[B][t])

very similar to: Creating an empty Pandas DataFrame, then filling it?

But the difference is fixing the first value of column 'C' to 100 vs entirely formulas.

import datetime
import pandas as pd
import numpy as np

todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')

columns = ['A','B','C']

df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0)

data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, index=index, columns=columns)

df['B'] = df['A'].pct_change()
df['C'] = df['C'].shift() * (1+df['B'])

## how do I set 2016-10-03 in Column 'C' to equal 100 and then calc consequtively from there?

df
Community
  • 1
  • 1
T. Payne
  • 79
  • 1
  • 10

1 Answers1

1

Try this. Unfortunately, something similar to a for loop is likely needed because you will need to calculate the next row based on the prior rows value which needs to be saved to a variable as it moves down the rows (c_column in my example):

c_column = []
c_column.append(100)

for x,i in enumerate(df['B']):
    if(x>0):
        c_column.append(c_column[x-1] * (1+i))

df['C'] = c_column
A.Kot
  • 7,615
  • 2
  • 22
  • 24
  • I think this can work but I get an error 'Length of values does not match length of index' – T. Payne Oct 13 '16 at 14:37
  • Edited with c_column.pop(-1) because you don't need the last item. – A.Kot Oct 13 '16 at 15:38
  • So that works and gives me 100 in the first cell however I get NaN for the rest of the data in column 'C'. I tried changing – T. Payne Oct 13 '16 at 16:01
  • c_column.append(c_column[x] * (1+i)) to – T. Payne Oct 13 '16 at 16:01
  • c_column.append(c_column[x].shift() * (1+i)) but didnt work. sorry for the multiple responses this is my first post. very much appreciate your help – T. Payne Oct 13 '16 at 16:01
  • It works for me. Check the values of c_column alone without adding it to the dataframe. What are the values? – A.Kot Oct 13 '16 at 16:04
  • [100, nan, nan, nan, nan, nan, nan, nan, nan, nan] – T. Payne Oct 13 '16 at 16:07
  • Because your column 'B' has a null value in the first row and an infinity value in the second because 1/0 is infinity. – A.Kot Oct 13 '16 at 16:14
  • Ah ok, in this case I'd like to multiply the 1st value in column C by the 2nd value in column B. I agree on your point that 1/0 is infinity but assuming 'B' was a value (ie. 0.50) we should still get complete values returned in 'C' as the NaN value in the first spot in 'B' is not needed for the calculation. Perhaps you can post or send me the full working code you are using and I can piece it together. Again many thanks for your help. – T. Payne Oct 13 '16 at 17:16
  • There are invalid values in BOTH the first AND the second spots for 'B'. The first value is NaN and the SECOND value is infinity. The second spot for 'B' is used to calculate the second spot for 'C', therefore, every value afterwards will also be invalid. – A.Kot Oct 13 '16 at 18:19
  • Type in df and look at your dataframe to see this. Not sure what you expect the first two values of 'B' to be. – A.Kot Oct 13 '16 at 18:21
  • I picked a poor example dataframe. simplified. the known data is Column A [5,6,7], Col B = df[A].pct_change() = [NaN, 0.20, 0.166], Col C = [100, 120, 140] where Col C = 100 (fixed), 120 = 100 x (1+20%) and 140 = 120 x (1+16.7%) – T. Payne Oct 13 '16 at 18:51
  • Fixed. Note that I removed the pop line. – A.Kot Oct 13 '16 at 19:01