0

This is the continuation of the OP1 and OP2.

Specifically, the objective is to remove duplicates if more than one dict has the same content for the key paper_title.

However, the line throw an error if there inconsistency in the way the list is imputed, such that if there is a combination of dict and str

TypeError: string indices must be integers

The complete code which generates the aforementioned error is as below: -

from itertools import groupby



def extract_secondary():
    # 
    test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
                 {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
                 {"paper_title": 'Unique One', 'Paper_year': 3}, \
                 {"paper_title": 'Unique two', 'Paper_year': 3}, 'all_result']
    f = lambda x: x["paper_title"]
    already_removed = [next(g) for k, g in groupby(sorted(test_list, key=f), key=f)]


extract_secondary()

May I know which part of the code needs further tweaks? Appreciate any insight.

PS: Please notify me if this thread is being considered duplicate to OP1. However, I believe this thread merits its own existence due to the uniqueness of the issue.

mpx
  • 3,081
  • 2
  • 26
  • 56
  • 3
    You shouldn't ask people to download and open a pickle file. Opening a pickle file could lead to arbitrary python code being executed (deleting your entire hard drive for example). You should update your example to just contain the data in the pickle file. – Bob Jul 08 '20 at 11:27
  • 1
    There's a `str` in `test_list` instead of `dict` (`"all_result"`). Hence `sorted` is complaining that it cannot use `f` for `str`. – Chris Jul 08 '20 at 11:34
  • Thanks for input @Chris, I managed to find the culprit base on your insight. – mpx Jul 08 '20 at 11:37

1 Answers1

0

Thanks to @Chris for pointing about the existence of str in test_list instead of dict ("all_result")

To address whereby sorted is raise an error that it cannot use f for str, the str need to be removed from the list.

As of OP, the str can be removed by

list(filter('all_result'.__ne__, test_list))

Note that, for this case, the str only have the value of 'all_result'.

The complete code then

def extract_secondary():

        test_list = [{"paper_title": 'This is duplicate', 'Paper_year': 2}, \
                     {"paper_title": 'This is duplicate', 'Paper_year': 3}, \
                     {"paper_title": 'Unique One', 'Paper_year': 3}, \
                     {"paper_title": 'Unique two', 'Paper_year': 3},'all_result','all_result']
        test_list=list(filter('all_result'.__ne__, test_list))
        f = lambda x: x["paper_title"]
        already_removed = [next(g) for k, g in groupby(sorted(test_list, key=f), key=f)]

extract_secondary()
mpx
  • 3,081
  • 2
  • 26
  • 56