0

I am trying to read 3 CSV files into 3 pandas DataFrame. But after executing the function the variable seems not available. Tries to create a blank data frame outside the function and read and set the frame in the function. But the frame is blank.

# Load data from the csv file
def LoadFiles():
    x = pd.read_csv('columns_description.csv', index_col=None)
    print("Columns Description")
    print(f"Number of rows/records: {x.shape[0]}")
    print(f"Number of columns/variables: {x.shape[1]}")
    
LoadFiles()
x.head()

Python Notebook for above code with Error

In the second approach, I am trying to create a new data frame with some consolidated information from the dataset. The issue reappears as the variable seems to be no longer available.

# Understand the variables
y = pd.read_csv('columns_description.csv', index_col=None)

def refresh_y():
    var_y = pd.DataFrame(columns=['Variable','Number of unique values'])
    for i, var in enumerate(y.columns):
        var_y.loc[i] = [y, y[var].nunique()]
        
refresh_y()

Screenshot with error code and solution restructuring in the function

I am a bit new to Python, The code is a sample and does not represent actual data and in the function, an example is with a single column. I have multiple columns to refresh in this derived data set based on changes further hence the function approach.

  • your functions should return the desired dataframes (and possibly take file name and `y` as argument respectively) – Tranbi Nov 20 '21 at 07:47

2 Answers2

0

When defining a function, if you want to use a variable that is defined in the function, you should end with return var. Check this: Function returns None without return statement and some tutorials on defining a function (https://learnpython.com/blog/define-function-python/).

A basic example to help you start with defining functions:

def sum_product(arg1,arg2): #your function takes 2 arguments
    var1 = arg1 + arg2
    var2 = arg1*arg2
    return var1,var2 #returns two values
new_var1, new_var2 = sum_product(3,4) 

For the first example try modifying it like:

def LoadFiles():
    var = pd.read_csv('columns_description.csv', index_col=None)
    print("Columns Description")
    print(f"Number of rows/records: {var.shape[0]}")
    print(f"Number of columns/variables: {var.shape[1]}")
    return var

x = LoadFiles()
x.head()
Savvas
  • 1
  • 2
  • I will try this out and update you. How do you handle multiple returns in function in python. I want to be able to read 3 or more files to 3 different frames. – Dilip M Nair Nov 20 '21 at 13:27
  • Check my edited answer. Additionally, following @Tranbi suggestion, it would be helpful to have y and filename as arguments for the function. – Savvas Nov 20 '21 at 21:38
0

try following code

# Load data from the csv file
def LoadFiles():
    x = pd.read_csv('columns_description.csv', index_col=None)
    print("Columns Description")
    print(f"Number of rows/records: {x.shape[0]}")
    print(f"Number of columns/variables: {x.shape[1]}")
    return x
    
x2 = LoadFiles()
x2.head()

Variables in a function is only available inside function. You may need study about scope. I recommend the following simple site about scope in Python.

https://www.w3schools.com/python/python_scope.asp

Terry Lam
  • 1,025
  • 1
  • 12
  • 13