0

I am trying to find only distinct items from a list of strings, the string has distinct items separated by ";" , but it is not stripping left space for one of the item and therefore I am receiving duplicate items.

gender_distinct = []
survey_genders = ['Male', 'Female', 'Other', 'Male; Other', 'Gender non-conforming', 'Male; Gender non-conforming', 'Female; Transgender', 'Transgender', 'Female; Gender non-conforming', 'Male; Female', 'Male; Female; Transgender; Gender non-conforming; Other', 'Transgender; Gender non-conforming', 'Male; Transgender', 'Female; Transgender; Gender non-conforming', 'Male; Female; Transgender; Gender non-conforming', 'Male; Female; Transgender', 'Gender non-conforming; Other', 'Male; Transgender; Gender non-conforming', 'Male; Gender non-conforming; Other', 'Male; Female; Other', 'Male; Female; Gender non-conforming', 'Female; Gender non-conforming; Other', 'Transgender; Other', 'Female; Transgender; Gender non-conforming; Other', 'Male; Female; Transgender; Other', 'Male; Female; Gender non-conforming; Other', 
'Female; Other', 'Female; Transgender; Other', 'Male; Transgender; Other']
for gender in list(survey_genders):
    for gender_each in gender.split(';'):
        if gender_each.strip() not in gender_distinct:
            print(gender_each)
            gender_distinct.append(gender_each)
print(" Distinct Gender ")
print(gender_distinct)

Code result: enter image description here

  • `strip()` doesn't change strings in place. Strings are immutable. So this `if gender_each.strip()` makes a new string, but it doesn't change the original. – Mark Apr 17 '20 at 03:32
  • Does this answer your question? [Why doesn't calling a Python string method do anything unless you assign its output?](https://stackoverflow.com/questions/9189172/why-doesnt-calling-a-python-string-method-do-anything-unless-you-assign-its-out) – Joe Apr 17 '20 at 05:43

3 Answers3

1

You are applying strip when making the containement test - the if statement - but you don't keep the stripped version of the string - and adds to the final list the version without stripping.

Just change your code to:

gender_distinct = []
survey_genders = ['Male', 'Female', 'Other', 'Male; Other', 'Gender non-conforming', 'Male; Gender non-conforming', 'Female; Transgender', 'Transgender', 'Female; Gender non-conforming', 'Male; Female', 'Male; Female; Transgender; Gender non-conforming; Other', 'Transgender; Gender non-conforming', 'Male; Transgender', 'Female; Transgender; Gender non-conforming', 'Male; Female; Transgender; Gender non-conforming', 'Male; Female; Transgender', 'Gender non-conforming; Other', 'Male; Transgender; Gender non-conforming', 'Male; Gender non-conforming; Other', 'Male; Female; Other', 'Male; Female; Gender non-conforming', 'Female; Gender non-conforming; Other', 'Transgender; Other', 'Female; Transgender; Gender non-conforming; Other', 'Male; Female; Transgender; Other', 'Male; Female; Gender non-conforming; Other', 
'Female; Other', 'Female; Transgender; Other', 'Male; Transgender; Other']
for gender in list(survey_genders):
    for gender_each in gender.split(';'):
        gender_each = gender_each.strip()
        if gender_each not in gender_distinct:
            print(gender_each)
            gender_distinct.append(gender_each)
print(" Distinct Gender ")
print(gender_distinct)

Now, in Python, if you want a container that will keep exactly one copy of each item, it is easier to use a set than a list:

...
gender_distinct = set()
for gender in list(survey_genders):
    for gender_each in gender.split(';'):
        gender_distinct.add(gender_each.strip())

print(" Distinct Gender ")
print(gender_distinct)
jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • Although in this case it's better to just use `for gender_each in gender.split('; ')` instead... – Jon Clements Apr 17 '20 at 03:36
  • And just to note that the set example could just be: `gender_distinct = {gender for genders in survey_genders for gender in genders.split('; ')}}` – Jon Clements Apr 17 '20 at 03:36
  • I'd rather not rely on the space after the ";" to be there - because it is essentialy a question of style, and too easy to be mistyped (either skipped or typed twice) - doubly so if the string data that is processed has a source external to the program (like a field in a form, for example) – jsbueno Apr 17 '20 at 03:38
1

corrected code:

gender_distinct = []
survey_genders = ['Male', 'Female', 'Other', 'Male; Other', 'Gender non-conforming', 'Male; Gender non-conforming', 'Female; Transgender', 'Transgender', 'Female; Gender non-conforming', 'Male; Female', 'Male; Female; Transgender; Gender non-conforming; Other', 'Transgender; Gender non-conforming', 'Male; Transgender', 'Female; Transgender; Gender non-conforming', 'Male; Female; Transgender; Gender non-conforming', 'Male; Female; Transgender', 'Gender non-conforming; Other', 'Male; Transgender; Gender non-conforming', 'Male; Gender non-conforming; Other', 'Male; Female; Other', 'Male; Female; Gender non-conforming', 'Female; Gender non-conforming; Other', 'Transgender; Other', 'Female; Transgender; Gender non-conforming; Other', 'Male; Female; Transgender; Other', 'Male; Female; Gender non-conforming; Other', 
'Female; Other', 'Female; Transgender; Other', 'Male; Transgender; Other']
for gender in list(survey_genders):
    for gender_each in gender.split(';'):
        if gender_each.strip() not in gender_distinct:
            print(gender_each)
            gender_distinct.append(gender_each.strip())
print(" Distinct Gender ")
print(gender_distinct)
satya
  • 41
  • 2
0

Consider just one problem case, Male; Transgender; Other

gender_distinct = []
...
gender = 'Female; Transgender'
for gender_each in gender.split(';'):

This will take on the three values "Female", " Transgender". We'll focus on the second value

    if gender_each.strip() not in gender_distinct:

... which it is not, since this is the first time we've seen Transgender, with or without the space.

        print(gender_each)
        gender_distinct.append(gender_each)

As your print shows, gender_each has not been changed; it still has the leading space! Thus, that's the version you append. Later, when you get to simple "Transgender", that also gets added.

To fix this, simply store the result of the strip and use that value for the rest of the loop -- don't get into the habit of changing a loop variable. Also, you can easily use a set to keep these labels.

gender_distinct = set()
...
for gender_each in gender.split(';'):
    stripped = gender_each.strip()
    gender_distinct.add(stripped):
    print(stripped)
Prune
  • 76,765
  • 14
  • 60
  • 81