I am reading Python Object-oriented programming by Steven and Dusty.
I am at Chapter 10, and it's all about Iterator design pattern. At the end of chapter there is an exercise to write generator function/expression to find common lines between two files.
Here are my naive approaches:
old = os.path.normpath('E:/testing/old.txt')
new = os.path.normpath('E:/testing/new.txt')
res = []
def func(source):
for line in source.readlines():
yield line
Approach 1:
with open(old, "r") as f:
for line in func(f):
with open(new, "r") as _f:
for _line in func(_f):
if line == _line:
res.append(line)
Approach 2:
with open(old, "r") as f:
for line in func(f):
with open(new, "r") as _f:
res.extend(filter(lambda a: line == a, func(_f)))
Context:
Now, let's think about totally different problem, let's assume we have list of strings instead of files with n
and m
elements, we can find common strings in O(n)
time complexity with O(m)
space.
Big questions
- Is there any better pythonic approach to re-factor above codes?
- I know generators are all about saving space and state, but following the above context, is there a better way in terms of time complexity to compare two files and find common lines.
My end goal is to optimize my code (if it is possible) in terms of time and space complexity and to make it more pythonic.
P.S. - Can someone point me to the source code of linux diff
command and git-diff
source code.
I tried finding source code of linux diff
and git-diff
but was unable to.