0

I am trying to wrap my head around python. Basically I am trying to remove a duplicate string (a date to be more precise) in some data. So for example:

2019-03-31
2019-06-30
2019-09-30
2019-12-31 
2020-03-31
2020-03-31

notice 2020-03-31 is duplicated. I would like to find the duplicated date and rename it as last quarter

conversely, If the data has no duplicates I want to leave as is.

I currently have some working code that checks to see if there are duplicates. And that is it. I just need some guidance in the right direction as to how to rename the duplicate.

def checkForDuplicates(listOfElems):
if len(listOfElems) == len(set(listOfElems)):
    return 
else:
    return print("You have a duplicate")
AndrewRoj
  • 97
  • 1
  • 8
  • 1
    What have you tried (researched) so far with regard to renaming; and where are you stuck? Function suggestion: Change your ‘if’ to ‘if not’ (or == to !=), move the print statement and delete the elif. – S3DEV May 03 '20 at 17:23
  • hey man thanks for the response. I've been looking at https://stackoverflow.com/questions/50828915/python-renaming-duplicated-values-based-on-another-variable and https://stackoverflow.com/questions/24685012/pandas-dataframe-renaming-multiple-identically-named-columns. Can't really wrap my head around the issue. What i tried is listed above. So far all I was able to do is return a print statement if they were duplicates. – AndrewRoj May 03 '20 at 17:28
  • 1
    Returning a `print` statement doesn't make much sense. It always returns `None`, which is the default return in Python anyway, so that offers the caller nothing useful. Ideally return a boolean (true or false) and let the caller print if they want. – ggorlen May 03 '20 at 17:32

3 Answers3

2

Use a set to keep track of items you've seen. If any items are already in the set, append the desired string (use = "last quarter" if you want a full rename; it's unclear).

data = """2019-03-31
2019-06-30
2019-09-30
2019-12-31
2020-03-31
2020-03-31""".split("\n")

seen = set()

for i, e in enumerate(data):
    if e in seen:
        data[i] += " last quarter"

    seen.add(e)

print(data)

Output:

['2019-03-31', '2019-06-30', '2019-09-30', 
 '2019-12-31', '2020-03-31', '2020-03-31 last quarter']
ggorlen
  • 44,755
  • 7
  • 76
  • 106
0

For your function, if you have list of elements then your function will be :

def checkForDuplicates(listOfElems):
    check=[]
    for i in range(len(listOfElems)):
       if listofElems[i] in check:
          #then it is a duplicate and you can rename it
          listofElems[i]='last quarter'
       else:
          check.append(listofElems[i])


This will replace all the duplicate enteries to last quarter.

0

If your list is sorted, you can try this.

def checkForDuplicates(listOfElems):
    for i in range(len(listOfElems)-1):
        if listOfElems[i]==listOfElems[i+1]:
            listOfElems[i+1] = "last quarter"
    retrun listOfElems