0

I have a data file. It is a csv file. I have created a dictionary like this from it: {movie_id: ('title', ['genres']}. I want to know how to remove the empty strings that come about in the list of genres within the tuple within the dictionary

The data file(.csv) is like this:

movie_id title genres 68735 Warcraft Action Adventure Comedy 124057 Kids at the round table

def read_movies(movie_file: TextIO) -> MovieDict:

    """Return a dictionary containing movie id to (movie name, movie genres)
    in the movie_file.
    """

    line = movie_file.readline()
    while line == '':
        line = movie_file.readline()

    reader = csv.reader(movie_file)

    movie_dict = {int(rows[0]): (rows[1], rows[4:]) for rows in reader}

    return movie_dict

I expect the output when movies_dict is called to be:

{68735: ('Warcraft', ['Action', 'Adventure', 'Fantasy']), 293660: ('Deadpool', ['Action', 'Adventure', 'Comedy']), 302156: ('Criminal', ['Action']), 124057: ('Kids of the Round Table', [])}

What I get with my code:

{68735: ('Warcraft', ['Action', 'Adventure', 'Fantasy']), 293660: ('Deadpool', ['Action', 'Adventure', 'Comedy']), 302156: ('Criminal', ['Action', '', '']), 124057: ('Kids of the Round Table', ['', '', ''])}

3 Answers3

2

It's not clear how your file looks like, how big and why do you want to parse it this way and not using Pandas (for example).

But answering your question. You can achieve this in your code this way

by replacing this line

movie_dict = {int(rows[0]): (rows[1], rows[4:]) for rows in reader}

by

movie_dict = {int(rows[0]): (rows[1], [e for e in rows[4:] if e != '']) for rows in reader}
David Sidarous
  • 1,202
  • 1
  • 10
  • 25
  • According to the "expected output" example, it doesn't seem to me the OP wants his genres list unpacked - it remains a separate list – Neo Aug 07 '19 at 22:48
  • By doing this what happens is that the list of genres in my tuples just becomes another string in the tuple and I don't want that also – Kabir Singh Aug 07 '19 at 22:49
1

The easiest way to go would be to filter the empty strings out:

non_empty = lambda s: len(s) > 0
movie_dict = {int(rows[0]): (rows[1], list(filter(non_empty, rows[4:]))) for rows in reader}

non_empty is an anonymous function determining a string (or really anything which we can call len for) isn't empty. It returns True for non-empty strings and False for empty ones. By passing it to filter among rows[4:] we get a copy of rows[4:] with only the values which returned True, hence the non-empty ones.

You could as well use list comprehension to filter out the empty strings: [s for s in rows[4:] if len(s) > 0] will do the exact same thing.

Both ways, the second item in your tuple is a list filtered for non-empty strings.

Neo
  • 3,534
  • 2
  • 20
  • 32
  • 1
    yeah that actually makes a lot of sense and helps. Thank you so much @Neo – Kabir Singh Aug 07 '19 at 22:50
  • actually it isnt working as in my genres list within the tuple I am recieinv this in my output: – Kabir Singh Aug 07 '19 at 22:51
  • Apparently (which actually makes sense), `filter` returns a "filter object", which doesn't actually do the filtering but remembers it and does it "lazily", meaning only when it's needed. You can use `list(filter(...))` to get that... But you could really use the list comprehension solution which now seems more compact – Neo Aug 07 '19 at 22:54
  • I changed the answer anyway to accommodate that – Neo Aug 07 '19 at 22:55
  • You can read https://stackoverflow.com/questions/13638898/how-to-use-filter-map-and-reduce-in-python-3 for further investigation – Neo Aug 07 '19 at 22:56
  • I didn't have to use list comprehension solution as using list(filter(...)) made everything work out. Thanks again! – Kabir Singh Aug 07 '19 at 22:57
0
dictionary = {}
dictionary['a']= ('name',['','p','q','',''])
for key in dictionary.keys():
    x,y = dictionary[key]
    print(x,y)
    dictionary[key] =(x, [s for s in y if len(s)!=0])

Parijat Bhatt
  • 664
  • 4
  • 6