0

i have written a function in Python that reads either a .csv or .xls and return it as a pandas dataframe. Based on the passed file_type the function uses either the pandas.read_csv() or pandas.read_excel() function with (just one slight) difference in the parameters. It works without an issue but its obviously repeated code i would like to reduce. So how could i best:

  1. Have just one function call that is dynamically changed to the specific one defined by the file_type variable
  2. Dynamically change the parameters of the then called function based on the same variable?

Here is my current code. Thanks for your help.

def file_to_df(file_name, fields= None, file_type = None, encoding = None):
    """Read stock level from csv or xlsx file.Filter SKU and Qty.Return dataframe."""

    if file_type == 'csv' or 'xls':
        if file_type == 'csv':
            data_frame = pd.read_csv(
                file_name,
                encoding = encoding,
                converters={'Barcode':str,'Qty':int},
                usecols=fields
            )
        elif file_type == 'xls':
            data_frame = pd.read_excel(
                file_name,
                converters={'Barcode':str,'Qty':int},
                usecols=fields
            )

        # Remove empty rows
        data_frame.replace('', np_nan, inplace=True)
        data_frame.dropna(axis=0, how='any', subset=None, inplace=True)

        return data_frame

    else:
        print('no csv or xls filetype was handed to file_to_df')

For the parameters i tried using two tuples that are put into the function call.

zoni
  • 1
  • Notice that `if file_type == 'this' or 'that'` [does not do what you probably want it to do](https://stackoverflow.com/questions/20002503/why-does-a-x-or-y-or-z-always-evaluate-to-true-how-can-i-compare-a-to-al)! – Ture Pålsson Feb 19 '23 at 17:04

2 Answers2

0

You can modify your signature function and use the keyword-only arguments (PEP 3102). After that, create a dict of parameters, add your fixed parameters (converters), rename some parameters (fields -> usecols) and add other parameters as it:

import pandas as pd
import pathlib

def file_to_df(file_name, **kwargs):
    xfile = pathlib.Path(file_name)
    
    params = {
        'converters': {'Barcode': str, 'Qty': int},  # add fixed parameters
        'usecols': kwargs.pop('fields', None)  # convert fields to usecols
    } | kwargs  # pass all other parameters as it

    # determine the right function according the extension
    funcs = {'.csv': pd.read_csv, '.xlsx': pd.read_excel}
    try:
        df = funcs[xfile.suffix](xfile, **params)
    except KeyError:
        raise RuntimeError('no csv or xls filetype was handed to file_to_df')

    return df
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

Don't pass a string that has to be mapped to a particular function; just pass the correct function.

def file_to_df(file_name, fields=None, *, converter, **kwargs):
    """Read stock level from csv or xlsx file.Filter SKU and Qty.Return dataframe."""

    data_frame = converter(file_name, , converters={'Barcode': str, 'Qty': int}, usecols=fields, **kwargs)
    data_frame.replace('', np_nan, inplace=True)
    data_frame.dropna(axis=0, how='any', subset=None, inplace=True)

    return data_frame

 
df1 = file_to_df('foo.csv', converter=pd.read_csv)
df2 = file_to_df('foo.xlsx', converter=pd.read_excel, encoding='...')
chepner
  • 497,756
  • 71
  • 530
  • 681