0

I have a dataframe with a column that consists of lists of lists (of varying length). One example: df['east'][0] gives

[array(['Indonesia', 'New Zealand'], dtype=object), array(['Indonesia', 'New Zealand'], dtype=object)]

I want to merge the lists inside of this bigger list and get rid of duplicates and make sure that data is presented clearly, i.e. simply

['Indonesia', 'New Zealand']

I tried some suggestions from here to remove duplicates, but, for example,for np.unique(functools.reduce(operator.add, east)) Python said "ValueError: operands could not be broadcast together with shapes (4,) (13,)"

I could usually solve problems, but here I am not sure what is happening - what are these arrays in the list.

user3349993
  • 309
  • 1
  • 3
  • 14
  • 1
    removing duplicates from list is easy: convert it to set and if need convert back to list: ` a=[1,2,2,3] b=[set(a)] print(b) ` as result b will be [1,2,3] – Drako Oct 19 '17 at 17:03

2 Answers2

2

One simple approach would be to flatten your lists/arrays with a comprehension and then use list(set()) to get unique values in a list:

df['east'].apply(lambda x: list(set(item for sublist in x for item in sublist)))
# example output: ['New Zealand', 'Indonesia']
cmaher
  • 5,100
  • 1
  • 22
  • 34
  • mostly in python, `([` is fine, but not pretty. you can just do `set(item for sublist in x for item in sublist)`. you're not then creating an intermediate list that you immediately discard. – acushner Oct 19 '17 at 17:19
  • glad you took it into consideration. you already had my upvote because it was a good answer regardless. – acushner Oct 19 '17 at 17:59
1

you can use the following one liner to achieve your results.

df['east'].apply(lambda value: reduce(lambda a, x: list(set(list(a) + list(x))), value, []))

lets break it down...

list(a) + list(x) = avoids shape error and adds to lists to return one list (you can use addition of np arrays directly if you keep the shapes same)

list(set(list(a) + list(x))) = array of all unique elements by first taking their set.

reduce(lambda a, x: list(set(list(a) + list(x))), value, [])) = recursively adds accumulator and the variable list to reduce it into one single list.

Arpit Goyal
  • 2,212
  • 11
  • 31