0

I've written what's effectively a parser for a large amount of sequential data chunks, and I need to write a number of functions to analyze the data chunks in various ways. The parser contains some useful functionality for me such as frequency of reading data into (previously-instantiated) objects, conditional filtering of the data, and when to stop reading the file.

I would like to write external analysis functions in separate modules, import the parser, and pass the analysis function into the parser to evaluate at the end of every data chunk read. In general, the analysis functions will require variables modified within the parser itself (i.e. the data chunk that was read), but it may need additional parameters from the module where it's defined.

Here's essentially what I would like to do for the parser:

def parse_chunk(dat_file, dat_obj1, dat_obj2, parse_arg1=None, fun=None, **fargs):
    # Process optional arguments to parser...

    with open(dat_file,'r') as dat:
        # Parse chunk of dat_file based on parse_arg1 and store data in dat_obj1, dat_obj2, etc.
        dat_obj1.attr = parsed_data

        local_var1 = dat_obj1.some_method()

        # Call analysis function passed to parser
        if fun != None:
            return fun(**fargs)       

In another module, I would have something like:

from parsemod import parse_chunk

def main_script():
    # Preprocess data from other files
    dat_obj1 = ...
    dat_obj2 = ...

    script_var1 = ...
    # Parse data and analyze
    result = parse_chunk(dat_file, dat_obj1, dat_obj2, fun=eval_prop,
                         dat_obj1=None, local_var1=None, foo=script_var1)
    
def eval_data(dat_obj1, local_var1, foo):
    # Analyze data
    ...
    return result

I've looked at similar questions such as this and this, but the issue here is that eval_data() has arguments which are modified or set in parse(), and since **fargs provides a dictionary, the variable names themselves are not in the namespace for parse(), so they aren't modified prior to calling eval_data().

I've thought about modifying the parser to just return all variables after every chunk read and call eval_data() from main_script(), but there are too many different possible variables needed for the different eval_data() functional forms, so this gets very clunky.

Here's another simplified example that's even more general:

def my_eval(fun, **kwargs):
    x = 6
    z = 1
    return fun(**kwargs)

def my_fun(x, y, z):
    return x + y + z

my_eval(my_fun, x=3, y=5, z=None)

I would like the result of my_eval() to be 12, as x gets overwritten from 3 to 6 and z gets set to 1. I looked into functools.partial but it didn't seem to work either.

Daniel
  • 1

1 Answers1

0

To override kwargs you need to do

kwargs['variable'] = value  # instead of just variable = value

in your case, in my_eval you need to do

kwargs['x'] = 6
kwargs['z'] = 1
yakir0
  • 184
  • 6
  • That works in the simplified example. But in my implementation, I may not need `'variable'` as a kwarg for certain functions, even though I need to set the value for other purposes in the parser. In fact, if I just want to parse the data without calling an analysis function, I wouldn't pass `func` or `**fargs` at all, so it wouldn't make sense to create a new dictionary to store all the parser variables if it's not used outside the parser. – Daniel Mar 11 '21 at 01:17
  • You mentioned returning all needed variables in `parse`. This actually sounds good, even if there are many possibilities. In `parse` you can keep track of a `dict` and every variable needed just use `dict['variable']=value`. return the `dict` and in your `main` call `eval` like `eval(..., **dict)` – yakir0 Mar 11 '21 at 01:48
  • Fair enough. All of the variables in `parse` are generally related; there's just a lot of them. Perhaps another alternative is to create a container class for the variables/data objects and write the analysis functions as methods for the container class. I'm mostly looking for flexibility as I write the analysis functions (which are primarily taking statistics). – Daniel Mar 11 '21 at 02:02