How to read from a file or a list?

Question

Say you have a piece of code that accepts either a list or a file name, and must filter through each item of either one provided by applying the same criteria:

import argparse

parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required = True)
group.add_argument('-n', '--name', help = 'single name', action = 'append')
group.add_argument('-N', '--names', help = 'text file of names')
args = parser.parse_args()

results = []

if args.name:
    # We are dealing with a list.
    for name in args.name:
        name = name.strip().lower()
        if name not in results and len(name) > 6: results.append(name)

else:
    # We are dealing with a file name.
    with open(args.names) as f:
        for name in f:
            name = name.strip().lower()
            if name not in results and len(name) > 6: results.append(name)

I'd like to remove as much redundancy as possible in the above code. I tried creating the following function for strip and lower, but it didn't remove much repeat code:

def getFilteredName(name):
    return name.strip().lower()

Is there any way to iterate over both a list and a file in the same function? How should I go about reducing as much code as possible?

I think you are on the right track, why not put everything in the `def` from `name = ...`? — l'L'l, Jan 12 '19 at 04:20

score 1 · Accepted Answer · answered Jan 12 '19 at 10:32

You have duplicate code that you can simplify: list and file-objects are both iterables - if you create a method that takes an iterable and returns the correct output you have less code duplication (DRY).

Choice of datastructure:

You do not want duplicate items, meaning set() or dict() are better suited to collect the data you want to parse - they eliminate duplicates by design which is faster then looking if an item is already in a list:

if the order of names matter use
- a OrderedDict from collections when on python 3.6 or less or
- a normal dict for 3.7 or more (dicts gurantee input order)
- more info: Are dictionaries ordered in Python 3.6+?
if name order is not important, use a set()

Either one of the above choices removes duplicates for you.

import argparse
from collections import OrderedDict # use normal dict on 3.7+ it hasinput order

def get_names(args):
    """Takes an iterable and returns a list of all unique lower cased elements, that
    have at least length 6."""

    seen = OrderedDict() # or dict or set

    def add_names(iterable):
        """Takes care of adding the stuff to your return collection."""
        k = [n.strip().lower() for n in iterable] # do the strip().split()ing only once
        # using generator comp to update - use .add() for set()
        seen.update( ((n,None) for n in k if len(n)>6))

    if args.name:
        # We are dealing with a list:
        add_names(args.name)

    elif args.names:
        # We are dealing with a file name:
        with open(args.names) as f:
            add_names(f)

    # return as list    
    return list(seen)

Testcode:

parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required = True)
group.add_argument('-n', '--name', help = 'single name', action = 'append')
group.add_argument('-N', '--names', help = 'text file of names')
args = parser.parse_args()

results = get_names(args)

print(results)

Output for -n Joh3333n -n Ji3333m -n joh3333n -n Bo3333b -n bo3333b -n jim:

['joh3333n', 'ji3333m', 'bo3333b']

Input file:

with open("names.txt","w") as names:
    for n in ["a"*k for k in range(1,10)]:
        names.write( f"{n}\n")

Output for -N names.txt:

['aaaaaaa', 'aaaaaaaa', 'aaaaaaaaa']

Thank you! This is a nice solution. I would only tweak it very little to suit my code more, and edit ```add_names``` so that there is only one iteration of ```iterable``` (instead of iterating over ```iterable``` and then further iterating over ```k```) which ```strip```s, ```lower```s, checks ```len```, and adds to ```seen``` in one go. — Hank, Jan 12 '19 at 18:04
@Hank if you do `{n.strip().lower() for w in iterable if len(n.strip()) > 6}` (using a set) it will strip() twice for every n - python is bad at optimizing list comps - that why I split it - but it is shorter code. It might matter if you feed it a file with a million names - it won't if you provide 5 to 10. — Patrick Artner, Jan 12 '19 at 18:10

wwii · Answer 2 · 2019-01-12T18:18:40.940

0

Subclass list and make the subclass a context manager:

class F(list):
    def __enter__(self):
        return self
    def __exit__(self,*args,**kwargs):
        pass

Then the conditional can decide what to iterate over

if args.name:
    # We are dealing with a list.
    thing = F(args.name)
else:
    # We are dealing with a file name.
    thing = open(args.names)

And the iteration code can be factored out.

results = []

with thing as f:
    for name in f:
        name = name.strip().lower()
        if name not in results and len(name) > 6: results.append(name)

Here is a similar solution that makes an io.StringIO object from either the file or the list then uses a single set of instructions to process them.

import io

if args.name:
    # We are dealing with a list.
    f = io.StringIO('\n'.join(args.name))
else:
    # We are dealing with a file name.
    with open(args.names) as fileobj:
        f = io.StringIO(fileobj.read())

results = []

for name in f:
    name = name.strip().lower()
    if name not in results and len(name) > 6: results.append(name)

If the file is huge and memory is scarce, this has the disadvantage of reading the entire file into memory.

edited Jan 12 '19 at 18:18

answered Jan 12 '19 at 04:25

wwii

23,232
7
37
77

The context manager doesn't add anything. Moving the loop into a function would be an improvement. – cco Jan 12 '19 at 07:06
@cco `list`'s are not context managers - they do not have `__entry__` or `__exit__` methods; the subclass allows the list to be used with a `with` statement making that part of the code common between the two objects - which is what the OP desired, reducing code duplication. Without it, the code would throw an AttributeError for a list. Did you try it without the subclass? Indeed I tested it with that last bit in a function but not knowing how the OP's program is structured I just wanted to show a way to deal with the duplicate code. – wwii Jan 12 '19 at 16:43
Or maybe I misunderstood the OP's problem/question. – wwii Jan 12 '19 at 16:52
2

This solution works, and is less code than the accepted answer, but seems like too hacky of a solution. Thank you for the creative perspective on it, though! – Hank Jan 12 '19 at 18:00
this is a "think outside the box" solution, dv wasn't mine but I'll compensate – Patrick Artner Jan 12 '19 at 18:11
Thnx, It was the first thing that came to mind - see edit for similar approach. – wwii Jan 12 '19 at 18:19
Using `with` doesn't reduce duplicated code, it merely makes the duplicated code *exactly* the same in both cases. Moving the body of your `with` to a function will reduce duplication. – cco Jan 13 '19 at 10:12
@cco - You are missing the point. I'm not implying that using `with` would reduce duplicated code. It is preferable to use `with` when working with a file. Making a list that is also a context manager allows *me* to use the iteration code, including the `with` statement, for either iterable. – wwii Jan 13 '19 at 22:29

How to read from a file or a list?

2 Answers2