1

I'm very new to Python. I need to compare two lists of words and detect those words in one list which are not in the other. Here are two test files

big_list.txt

[coin, co-operate, accurate, achieve, adapt, adjust, admire, admission, enter, advance, adventure, aeroplane, plane, affair, aim, objective, annual, approach, approve, argument]

small_list.txt

[coin, co-operate,  football, accurate, achieve, adapt, amazing, adjust, admire, admission, enter, advance, breakfast]

with this expected output

[football, amazing, breakfast] 

I have a pretty simple Python script here

from sys import argv
big_list, small_list = argv
blist = open(big_list).read()
slist = open(small_list).read()
dlist = [item for item in slist if item not in blist]
diff_list = open(dlist, 'w').write()
diff_list.close()

but when run it returns this error message

roy@medea:~/e2tw/list_comparison$ python file_comp1.py big_list.txt small_list.txt
  Traceback (most recent call last):
       File "file_comp1.py", line 3, in <module>
          big_list, small_list = argv
   ValueError: too many values to unpack
Mureinik
  • 297,002
  • 52
  • 306
  • 350
RoyS
  • 1,439
  • 2
  • 14
  • 21
  • Side-note: WTH is going on with those last lines? You're opening a `list`, `write`-ing nothing to it, storing the return from `write` (hint: Not a file object), then `close`-ing the non-file. Also, `read`ing files with Python `list` literals in them doesn't create `list`s. You'll need `ast.literal_eval` to convert strings in Python literal form to actual Python objects instead of raw bytes. – ShadowRanger Jun 04 '16 at 09:32
  • As you are very new to python, I think we should not downvote your question. Look at my answer suggested, I think this code should have enough boilerplate to help you write solid code and further experiment with your tasks. If there are errors please comment, I will correct. Thanks. – Dilettant Jun 04 '16 at 09:46
  • 2
    @ShadowRanger Your observations are very valid. Of note is that OP's file content cannot be evaluated directly with `ast` because the items in the list are not string literals. – Moses Koledoye Jun 04 '16 at 10:01
  • @MosesKoledoye: Depends on whether the question's described format is accurate. The desired end result isn't representable in Python either (aside from terrible manual formatting), so it's possible it's "pseudodata". – ShadowRanger Jun 04 '16 at 13:33

4 Answers4

4

Try:

big_list, small_list = argv[1:]

Why? Because three parameters will be passed to your script by default, with argv[0] being the script name

P.S. In your last two lines, there is a bug waiting to go off. You can't pass a list as a reference to a file object. You should do this instead:

output_file = open("filename.txt", "w")
output_file.write("[%s]" % ", ".join(dlist))
output_file.close()
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
1

argv[0] contains the name of the python script being run (analogous to the way C's argv[0] has the executable name). You obviously cannot assgin three values (['file_comp1.py', 'big_list.txt', 'small_list.txt']) into two variables. You could, e.g., slice argv to get only the second argument and onwards:

big_list, small_list = argv[1:]
Mureinik
  • 297,002
  • 52
  • 306
  • 350
0

Try at least(Please look at the second code snippet for a real fully working answer):

from sys import argv
big_list, small_list = argv[-2:]
blist = open(big_list).read()
slist = open(small_list).read()
dlist = [item for item in slist if item not in blist]
diff_list = open(dlist, 'w').write()
diff_list.close()

The first entry will always be your script itself, but there are many things not working as others partially pointed already out. Look below working code :-) to get you going.

You can also use [1:] which is more widely used to ignore the first entry at index 0 and take all the rest. In hackish/rookie code I prefer the explicit -"number of expected" parameters though.

But maybe better write something like this to get going:

#! /usr/bin/env python
from __future__ import print_function
import sys


def read_list_from_file(a_path):
    """Simple parser transforming a [a, b,] text in file
    at a_path into a list."""
    return [z.strip() for z in open(a_path, 'rt').read().strip('[]').split(',')]


def a_not_in_b_list(a_seq, b_seq):
    """Return the list of entries in a_seq but not in b_seq."""
    return [item for item in a_seq if item not in b_seq]


def main():
    """Drive the diff."""
    if len(sys.argv) == 3:
        big_list, small_list = sys.argv[1:]
    else:
        print("Usage:", __file__, "<big-list-file> <small-list-file>")
        return 2
    # Do something with the file names given here
    b_list = read_list_from_file(big_list)
    s_list = read_list_from_file(small_list)

    with open('diff_list.txt', 'w') as f:
        f.write('%s\n' % (repr(a_not_in_b_list(s_list, b_list)),))


if __name__ == '__main__':
    sys.exit(main())

Running this on your text files gives in diff_list.txt:

['football', 'amazing', 'breakfast']
Dilettant
  • 3,267
  • 3
  • 29
  • 29
0
from sys import argv
big_list = argv[1]
small_list = argv[2]
blist = open(big_list).read()
slist = open(small_list).read()
dlist = [item for item in slist if item not in blist]
diff_list = open(dlist, 'w').write()
diff_list.close()

Check out this answer on how to use argv.Using argv

argv[0] is the script name.

Community
  • 1
  • 1
formatkaka
  • 1,278
  • 3
  • 13
  • 27
  • @Dilettante Thanks for your very helpful answer. Your suggested script is instructive. I like the function definitions, being a complete Python newbie I was unaware of this capability in Python, tho, of course, it's common enough in other languages. Just one other point, I wonder how your script would scale as some of the comparisons I intend to make involve lists containing around 10,000 words. To all the others, thanks too. SO is a really awesome community. – RoyS Jun 04 '16 at 11:39