2

I need to repeat similar operations in data frames with identycal structures but refering to different years; The objective is to generate a function (the one included is just for the sake of the presentation of the problem) that calls the data using one of its arguments (the year in this case). I would like to be able to select the data frame within the function using its name, in this case using the last part of its name (its year) as an argument of the function'

    import pandas as pd
    import numpy as np

Suppose you have three data frames, one for each year

    df_2005 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
    df_2006 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
    df_2007 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))

This functions extracts, as an example, some data from the data frame to generate a different variable

    def func_1 (Year):
       data=data_+year
       X=data.iloc[[1,2],[2,3]].copy() 
       return X

This is the way I intend to use the function:

 subdata_2005=func_1('2005')

I have tried several things like data_+year, data_`year', but nothing seems to work. I was not able to find answers to similar questions that could help me in this case. Any suggestion would be highly appreciated

  • Not sure I understand, do you mean something like this? data = data+ '_'+ year – Mayowa Ayodele Jan 07 '20 at 16:29
  • Many thanks for answering. Exactly, I tried similar things but always got the error 'data' is not defined – Casiano Manrique Jan 07 '20 at 16:33
  • Does this answer your question? [How to get the value of a variable given its name in a string?](https://stackoverflow.com/questions/9437726/how-to-get-the-value-of-a-variable-given-its-name-in-a-string) – G. Anderson Jan 07 '20 at 16:34
  • Many thanks for your suggestion. It does not seem to work within a function. I am able to generate the string but I need it to be considered a data frame. – Casiano Manrique Jan 07 '20 at 16:50
  • Sorry, I think I somehow unintentionally deleted the last message by Mayowa Ayodele. – Casiano Manrique Jan 07 '20 at 17:24
  • @CasianoManrique, the problem is how to see it as a variable name not a string. Are the dataframes large? and are they a lot? – Mayowa Ayodele Jan 07 '20 at 17:24
  • Exactly, I cannot figure out how to make python to interpret it as a variable name. The original data frames are not huge (2551x2875) but I have several files (around 15). – Casiano Manrique Jan 07 '20 at 17:37
  • @CasianoManrique, it may work better to create a list of dataframes then as presented in the answer below, let me know if this helps – Mayowa Ayodele Jan 07 '20 at 17:38
  • Variable names are not strings, and you really shouldn't design your code to dynamically create strings and use those strings to try to refer to variables. Rather, use a *container*. – juanpa.arrivillaga Jan 07 '20 at 17:43
  • You are right. It does not work at all like I tried. Will study the use of containers also. Many thanks. – Casiano Manrique Jan 07 '20 at 17:58

1 Answers1

0

What I would suggest is have like a dict/list of dataframes e.g

rather than try this which will not work

for i in range(2005,2007):
    'df_'+i = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))

try

dfs = {}
for i in range(2005,2007):
    dfs[i] = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))

consequently

def func_1 (Year):
       data=data_+year
       X=data.iloc[[1,2],[2,3]].copy() 
       return X

will therefore be

def func_1 (Year):
       data = dfs[Year]
       X=data.iloc[[1,2],[2,3]].copy() 
       return X
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Mayowa Ayodele
  • 549
  • 2
  • 11