0

I am using a for loop to read multiple csv files and create dataframe.I would like to access these data frames outside for loop as well. For which I used the Global keyword as well but it doesn't work.

for file in os.listdir('C:\\Users\\ABCDE\\Desktop\\Measurement'):
  if file.endswith('.csv'):
     print(file)
     name = file[3:6]
     global df_name   # this is the line 
     df_name = 'df' + name  
     print(df_name)
     df_name = pd.read_csv('C:\\Users\\ABCDE\\Desktop\\Measurement\\' + str(file),low_memory = False)
     df_name.rename(columns={0:'values'}, 
             inplace=True)       
     g = df_name.level_1.str[-2:] # Extracting column names
     df_name['lvl'] = df_name.level_1.apply(lambda x: int(''.join(filter(str.isdigit, x))))

As you can see above, I would like to access these dataframes (df_name (3 dataframes as I have 3 files) outside for loop as well

How do I use Global keyword to make these dataframes accessible outside for loop?

The Great
  • 7,215
  • 7
  • 40
  • 128
  • Is your for loop in another function? If not you don't even need to use ```global```. You can just define a variable before your loop and then modify it inside your loop. – Berkay Tekin Öz Jul 29 '19 at 06:25
  • 1
    @BerkayÖz - I am reading all the files from a directory. So, My aim is to have unique variable name for each dataframe. It's not the same dataframe name for different files. Each file will have different datframe name. So in this case, should I still be declaring a variable outside? Is it a recommended ? – The Great Jul 29 '19 at 06:28
  • You are trying 2 actions in one line, that is why it gives an error. And also it is a must not a recommendation, you can define outside of the scope. – Alper Jul 29 '19 at 06:31
  • @AVLES In that case you should declare a **list** or a **dictionary**. Create a local variable in your loop, use that local variable for dataframe purposes and then add that variable to your list or your dictionary. – Berkay Tekin Öz Jul 29 '19 at 06:33
  • I mean if I have 10 files to be read, do I have to define 10 variables? Will it not retain only the last (10th) file data in the dataframe if I declare one variable outside? – The Great Jul 29 '19 at 06:33
  • 2
    @AVLES As I mentioned before you don't need to create variables for each of your files. Also it is not recommended and not a good practice. Just add them to a list or a dictionary and then access them from there. – Berkay Tekin Öz Jul 29 '19 at 06:35
  • see: https://stackoverflow.com/questions/423379/using-global-variables-in-a-function/423596#423596 – gregory Jul 29 '19 at 06:37

4 Answers4

1

You need to define the variable name at the top of the function then use

a = dataframe

def func():
    global a
    a = yourdataframe
Alper
  • 487
  • 4
  • 17
1

You need to add a separate line after declaring the variable to make it global Something like this

df_name = 'df' + name 
global df_name
Jarvis101
  • 9
  • 2
1

I can understand what you're trying to achieve, but not why do you expect your code to work. 'df' + name is a string, not a variable; plus, you don't declare an external variable like that. The syntax is much simpler, and has nothing to do with pandas. Here's an example of the usage:

a = 'foo'

def get_a():
    global a
    return a

def set_a(b):
    global a
    a = b

if __name__ == '__main__':  # Just defining the entry point of the python script
    print(get_a())
    set_a(2)
    print(get_a())
    print(a)

And here is what you should expect as output of the script:

'foo'
2
2
Alessandro Flati
  • 322
  • 5
  • 12
  • Yes, am using that string as a dataframe name – The Great Jul 29 '19 at 06:32
  • 1
    Ok, then you have three main options: using `globals()['df' + name]`, using `getattr(module_where_the_variable_is, 'df' + name)` or `eval('df' + name)`. Watch out for this last one, because eval can execute any code and thus is risky if exposed to public. – Alessandro Flati Jul 29 '19 at 06:38
  • I tried globals. It doesn't work. I just have a simple for loop and don't really have any function or multiple modules. Just want that dataframe to be accessible outside for loop. Thanks for your response and help – The Great Jul 29 '19 at 06:54
  • Then answer given by Berkay Öz is the most suitable for your needs, although the question was very misleading :) – Alessandro Flati Jul 29 '19 at 07:10
  • Can you help me with this ? https://stackoverflow.com/questions/57250943/append-dataframes-with-different-column-names-pandas/57251131#57251131 – The Great Jul 29 '19 at 10:00
  • can you help me with this? https://stackoverflow.com/questions/57307386/error-despite-global-keyword-being-used-to-access-variable-inside-function – The Great Aug 01 '19 at 11:03
1

After your clarification with comments, you can achieve what you want using a list or a dictionary.

dataFrames = list()
dataFrameDict = dict()

for file in os.listdir('C:\\Users\\ABCDE\\Desktop\\Measurement'):
  if file.endswith('.csv'):
     print(file)
     name = file[3:6]
     df_name = pd.read_csv('C:\\Users\\ABCDE\\Desktop\\Measurement\\' + str(file),low_memory = False)
     df_name.rename(columns={0:'values'}, 
             inplace=True)       
     g = df_name.level_1.str[-2:] # Extracting column names
     df_name['lvl'] = df_name.level_1.apply(lambda x: int(''.join(filter(str.isdigit, x))))
     # ADD TO A LIST
     dataFrames.append(df_name)
     # OR TO A DICT
     dataFramesDict[name] = df_name


# How to Access

# Index for 10 files would be 0-9
index = 0
dataFrames[index]

# Name of the dataset you want to access
name = "..."
dataFrameDict[name]
  • Oz - Just curious to know, other than using list and dicts, isn't there anyway to use global keyword just in this for loop and be able to access them outside for loop? I don't have any functions as well.. Just a plain for loop – The Great Jul 29 '19 at 06:59
  • 1
    @AVLES Using ```global``` keyword lets you access a variable that is outside of your function body. And that will be just a single variable. In your case you need multiple variables but how would you know how many variables you need if you work with a dynamic range of files. In that case you can't just describe variables for each of your files. Only implementation that can handle this will be a **list** or a **dictionary**. If you had a single file, you could just define a variable and then attach your dataframe to that variable. You wouldn't need to use ```global``` – Berkay Tekin Öz Jul 29 '19 at 07:06
  • Oz - Can you help me with this post ? https://stackoverflow.com/questions/57250943/append-dataframes-with-different-column-names-pandas/57251131#57251131 – The Great Jul 29 '19 at 09:56
  • Can you help me with this? https://stackoverflow.com/questions/57307386/error-despite-global-keyword-being-used-to-access-variable-inside-function – The Great Aug 01 '19 at 10:54