How to get only distinct values from a list?

Question

I am trying to iterate through a column in a text file, where each entry has only three choices A, B, and C.

I want to identify the number of different types of choices (another text file has A, B, C, and D), but if I iterate through each element in the column with a 100 entries and add it to a list, I'll have multiple repetitions of each type. For example, if I do this, the list might read [A,A,A,B,C,C,D,D,D,B,B...], but I want to remove the extraneous entries and just have my list show the distinguishable types [A,B,C,D], regardless of how many entries there were.

Any ideas how I might reduce a list with many common elements to a list with only the different distinguishable elements displayed? Thanks!

Desired Output:

[A, B, C, D]

would help if you posted a snippet of the `txt` and any code you attempted — vash_the_stampede, Oct 06 '18 at 17:49
@Ferreroire, you can accept the answer along with upvote if you feel that solves your requirement that way it will be removed from un-answered queue. — Karn Kumar, Oct 06 '18 at 18:52

Karn Kumar · Accepted Answer · 2018-10-06T18:47:41.097

This is what you needed with `set()`:

>>> lst1 = ['A','A','A','B','C','C','D','D','D','B','B']
>>> list(set(lst1))
['A', 'B', 'D', 'C']

Another solution `OrderedDict` to keep the order of keys during insertion.

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(lst1))
['A', 'B', 'C', 'D']

In case you have liberty to use pandas then try below ones..

>>> import pandas as pd
>>> drop_dups  = pd.Series(lst1).drop_duplicates().tolist()
>>> drop_dups
['A', 'B', 'C', 'D']

In case you are looking for common values between two files:

$ cat getcomn_vals.py
#!/python/v3.6.1/bin/python3
def print_common_members(a, b):
    """
    Given two sets, print the intersection, or "No common elements".
    Remove the List construct and directly adding the elements to the set().
    Hence assigned the dataset1 & dataset2 directly to set()
    """

    print('\n'.join(s.strip('\n') for s in a & b) or "No common element")

with open('file1.txt') as file1, open('file2.txt') as file2:
    dataset1 = set(file1)
    dataset2 = set(file2)
    print_common_members(dataset1, dataset2)

Thanks a lot! That was very helpful! – ShinyPebble Oct 06 '18 at 19:42 — ShinyPebble, Oct 06 '18 at 19:42

jihan1008 · Answer 2 · 2020-01-30T09:56:26.897

6

There is a data structure called set in python that do not allow duplicates. This might help you out.

documentation for set() at docs.python.org

edited Jan 30 '20 at 09:56

answered Oct 06 '18 at 17:49

jihan1008

340
1
10

score 0 · Answer 3 · answered Oct 06 '18 at 17:58

We could use itertools.groupby and sorted to get this list of unique elements

from itertools import groupby

with open('text.txt') as f:
    content = [line.strip('\n') for line in f]

l = [k for k, g in groupby(sorted(content))]
print(l)
# ['A', 'B', 'C', 'D']

How to get only distinct values from a list?

3 Answers3

This is what you needed with set():

Another solution OrderedDict to keep the order of keys during insertion.

In case you have liberty to use pandas then try below ones..

This is what you needed with `set()`:

Another solution `OrderedDict` to keep the order of keys during insertion.