1

Pandas seems to ignore the extra (invalid) parameters. For e.g.

import pandas as pd
df=pd.read_excel('myfile.xlsx', some_dummy_param=True)

I expected (but did not get) an error something like ...

TypeError: __init__() got an unexpected keyword argument 'some_dummy_param'

The problem is, since there is no error, I consider "some_dummy_param" to be valid. This is certainly not expected. Is there any way to make sure only valid parameters are passed to read_excel method?

shantanuo
  • 31,689
  • 78
  • 245
  • 403
  • Are you trying to use the same parameters across different read functions in pandas? – cs95 May 14 '18 at 05:19
  • 1
    Pandas take keyword arguments **kwds as an argument in read_excel function, that's why it is not throwing error. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html#pandas.read_excel – Soumendra May 14 '18 at 05:19

2 Answers2

5

No, not really.

Keyword arguments are often passed in Pandas through **kwargs, which is either forwarded or treated as a dict. The functions that use this dict are free to check if there are keys other than those they expect or not.

You could do something like:

def safe_read_excel(self, f_name, *args, **kwargs):
    # Check if kwargs contains wrong parameters
    if set(kwargs.keys()).difference(set(<expected keys>)): 
        raise ValueError(<some messagge>)
    return self.read_excel(f_name, *args, **kwargs)

pd.DataFrame.safe_read_excel = safe_read_excel

However, this would

  1. create a non-standard method for DataFrame
  2. possibly break for different versions of Pandas
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
1

Yeah, this isn't going to be a trivial problem to solve. pd.read_excel accepts **kwargs in its signature. This means that you can pass whatever keyword arguments you please, because read_excel is not going to do anything with keyword arguments that it doesn't need to use.

One way of approaching this problem is to

  1. Determine what keyword arguments read_excel actually accepts
  2. Build an argument list for read_excel
  3. Filter out invalid arguments based on the results of (1)
  4. Pass the filtered argument list to the function

To handle (1), you can use the inspect module to determine what arguments pd.read_excel accepts. In particular, the inspect.signature method returns a Signature object from which you may query the parameters attribute. This returns a mappingproxy (effectively an immutable dictionary).

import inspect
args = inspect.signature(pd.read_excel).parameters

print(args)
mappingproxy({'convert_float': <Parameter "convert_float=True">,
              'converters': <Parameter "converters=None">,
              'date_parser': <Parameter "date_parser=None">,
              'dtype': <Parameter "dtype=None">,
              ...})

Here, it is assumed (2) is already done. However, in your case, you will need to ensure your potential parameters are inside a dictionary, as this will make it really easy to intersect on the mappingproxy and filter.

params = {'io' : 'myfile.xlsx', 'some_dummy_param' : True}

Step (3) involves performing a set intersection on the keys, and then rebuilding a new parameter list only from the intersection.

valid_params = {k : params[k] for k in params.keys() & args.keys()}

print(valid_params)
{'io': 'myfile.xlsx'}

This forms your valid argument list—the basis for (4).

df = pd.read_excel(**valid_params)
cs95
  • 379,657
  • 97
  • 704
  • 746