0

My three data frames (denoted df1, df2, df3) are 190 rows x 100,000 columns of time series data. I have different macroeconomic data in each data frame. The left most column are names of countries the time series will belong to. The first row is a title row of the time stamps of the data. I need to perform a time series calculation using the data across the three data frames so that the time stamps match, that the respective calculation using the value at 50 x 2500 is the same for all three data frames.

calculation = df1 value / ((df2 value - df1 value) * df3 value)

How can I go about this in a vectorized manner?

Small Minimum example of 3 dataframes and and a desired final results df

df1 = pd.DataFrame([['Bulgaria', 2, 3, 4, 5], ['Estonia', 2, 3, 4, 5], ['Sweden', 2, 3, 4, 5]], columns=['State', '1990', '1991', '1992', '1993'])


df2 = pd.DataFrame([['Bulgaria', 12, 13, 14, 15], ['Estonia', 12, 13, 14, 15], ['Sweden', 12, 13, 14, 15]], columns=['State', '1990', '1991', '1992', '1993'])

df3 = pd.DataFrame([['Bulgaria', .02, .03, .04, .05], ['Estonia', .02, .03, .04, .05], ['Sweden', .02, .03, .04, .05]], columns=['State', '1990', '1991', '1992', '1993'])

intended_final_df = pd.DataFrame([['Bulgaria', 10, 10, 10, 10], ['Estonia', 10, 10, 10, 10], ['Sweden', 10, 10, 10, 10]], columns=['State', '1990', '1991', '1992', '1993'])
toaster
  • 39
  • 5
  • You should rarely need to use loops in pandas, however in order to better help you, we need more information about your problem. Please see [How to create good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a [mcve] for your specific problem, with sample input and output – G. Anderson Jul 09 '19 at 19:23
  • what kind of calculation? – ALollz Jul 09 '19 at 19:24
  • Your question is very vague. Please give us a sample data and clarify what you are trying to do. – FatihAkici Jul 09 '19 at 19:28
  • I have updated the question with a minimum example to help spell out the problem. – toaster Jul 10 '19 at 13:07

1 Answers1

1

You need to move the state column into the index for all dataframes, then use this calculation:

df1 / ((df2-df1)* df3)

I used this command to iterate throught all dataframes setting the index:

[d.set_index('State', inplace=True) for d in [df1,df2,df3]]

Output:

          1990  1991  1992  1993
State                           
Bulgaria  10.0  10.0  10.0  10.0
Estonia   10.0  10.0  10.0  10.0
Sweden    10.0  10.0  10.0  10.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Could someone explain how the calculation works? It totally does which is great but I would like to understand it more so I can use the knowledge to build more independently in the future. – toaster Jul 10 '19 at 14:14
  • The calculation works because pandas does intrinsic data alighment, which means that pandas looks at the dataframe indexes and the column headers and aligns all calculation to happen on matching cells. – Scott Boston Jul 10 '19 at 14:17
  • Ok that is very convenient! And could you elaborate on the command for iterating through all of the dataframes?, it looks like the .set_index is locking down the reference column that is shared across all the dataframe for the iterating? – toaster Jul 10 '19 at 14:19
  • Yes, because, I want the numbers in the dataframe to calculate based on 'state', I move the state column into the index of the dataframe. I could have done this with three separate commands `df1 = df1.set_index('State'); df2 = df2.set_index('State'); df3 = df3.set_index('State')` But instead, I choose to use list comprehension and called the set_index method on each dataframe in a list. Using `inplace=True` to modify the dataframe in the current memory. – Scott Boston Jul 10 '19 at 14:23