0

I need to take a string, e.g. AABCAAADA and split it into a given number of strings, say 3 (but this could be anything). Then, within those strings, delete duplicate chars. The output of this code should be, for the given example, AB, CA and AD.

My code produces DA for the final example and I can't see why this is.

import textwrap



def splitToT(string, number):
    wordsList = list(textwrap.wrap(string,number))
    for word in wordsList:
        t = word
        makeU(t)

def makeU(t):
    list1 = list(t)
    list2 = list(t)
    print"list1 = "
    print list1
    print"list2 = "
    print list2
    for l1e in list1:
        print "element from list1"
        print l1e
        count = 0
        print"COUNT RESET"
        for l2e in list2:
            print "\t Element in list2"
            print("\t" + l2e)
            if str(l1e) == str(l2e):
                count = count+1
                print count
                if count >= 2:
                    print("removing element")
                    print l2e
                    list2.remove(l2e)
                    print"\tlist 2 is now"
                    print list2
    print "LIST2 IS:"
    print list2
    print("-----")

def main():
    n = 3
    S = 'AABCAAADA'
    splitToT(S, n)

if __name__ == "__main__":
    main()
Archeofuturist
  • 215
  • 5
  • 18
  • 1
    Looks complicated. Check out [How do you remove duplicates from a list whilst preserving order?](https://stackoverflow.com/q/480214/953482) for some simpler approaches to duplicate removal. – Kevin Oct 29 '18 at 17:43

3 Answers3

0

You can treat the string as a list, and reconstruct the slices as sets, like so:

s = list("AABCAAADA")

def slicer(s, slices):
    data = []
    for x in slices:
        # slices is a list of tuples, where x[0]
        # is the starting slice and x[1] is the end
        data.append(set(s[x[0]:x[1]]))
    return data
slicer(s, [(0, 3), (3, 6), (6, 9)])

To preserve the order of the string, I think you would have to iterate through the slice and only pop new unique values once. But otherwise I think the approach would be the same

Charles Landau
  • 4,187
  • 1
  • 8
  • 24
0

Your program removes the first set of elements.

AABCAAADA

gets broken up to

aab
caa
ada

you then count how many of the element is found in the list, and if you find two or more, you remove it.

so aab takes a, and counts it, its greater than 2, so it removes the first a. the second iteration, it has ab, and it counts a again, but there is only one a, so it passes.
third iteration it counts b, and since there is only one, it adds it to your list.

I'll capitalize the letters you're removing

Aab
cAa
Ada

the last one looks wrong, but its because how your algorithm is designed, it leaves the last letter there.

Hemerson Tacon
  • 2,419
  • 1
  • 16
  • 28
0

Below is something that would help you

import textwrap
s='ababbcdfgfhh'
n=3 #or anything, you can take as input
lst=[]
x=0
lst=list(textwrap.wrap(s, n))

print(lst)
for x in lst:
    print(''.join(sorted(set(x), key=x.index))) # removing the duplicates by using 'set'

It will produce output as:

['aba', 'bbc', 'dfg', 'fhh'] 
ab
bc
dfg
fh

The first line in the output was printed to check the lst

Sandesh34
  • 279
  • 1
  • 2
  • 14