3

Say I write a Python script for data analysis that includes a function parse_csv which might return a dataframe in a certain format. I can then analyse and plot this output etc.

Some time later, working with different data in the same format, I decide that it's better to parse the csv in a different way, so I improve the function so that it returns a slightly different output to be plotted etc. in a slightly different way. E.g. a dataframe that now contains datetime objects for the columns instead of just the date and time as a string.

I still want all the original scripts to run. I want all of the functions for the old and new scripts to be in one python library for the project. I don't want to go back and change all of the code for the old scripts to work with the new and improved function.

What is the best way to version my functions? I can think of a few options, but I'm wondering if missing anything:

  1. Is there a way to do version control of libraries in python?, e.g. "import myLib version 0.1". Then I can just import different versions of the library for different scripts
  2. I call the functions different things e.g. parse_csv_1, parse_csv_2
  3. I have an argument to the function to redirect to different functions inside the function e.g.:
def parse_csv(csv, version=1):
    if version == 1:
        return parse_csv_1(csv)
    elif version == 2:
        return parse_csv_2(csv)

Is there a way to achieve what I want using the first option, as it seems cleaner to me, and would work for improvements to multiple functions across scripts? Or do I need to do this in a different way?

I have seen this question which refers to importing specific versions of libraries, but I haven't been able to see how I would install multiple versions at the same time.

Noah Sprent
  • 133
  • 5

1 Answers1

1

That is a pretty interesting question. Let's address them one by one:

1. Library versioning:

If you are publishing your library to PyPi or any other repository, you can have different versions, and install a specific version using pip. However, you cannot have multiple versions of the same library

2. Calling it different things:

This would the prefered method. However, I don't recommend naming it _1, _2. You should name it in a descriptive way so you understand the difference between the different functions. So based on your description, you'd have parse_csv and parse_csv_date_as_datetime.

3. Redirecting with arguments

In general this is not a good idea. Arguments should not fundamentally change how your function behaves. That can lead to unexpected behaviours. Parameters should be the data you perform computations on.

In conclusion

I recommend just creating a second function. Use the original in your past code, use the new one in the new code.

If you really must have a single function, you could try applying the Strategy Pattern to handle different data formatting tasks, passing in a strategy function and having a default what matches the current behaviour, this however I'd say is probably beyond your current level.

tituszban
  • 4,797
  • 2
  • 19
  • 30