0

Duplicate values of column needs to be converted to unique values

I have a dataframe with certain r*c. I need to consider one column which is basically the ID column having multiple duplicate IDs. The IDs would need to be made unique. Suppose I have the below mentioned df:

data = [['tom', 10], ['nick', 15], ['juli', 14], ['juli', 14], ['juli', 14]] 

df = pd.DataFrame(data, columns = ['Name', 'Age']) 

df 

Actual Result: 

    Name    Age
0   tom 10
1   nick    15
2   juli    14
3   juli    15
4   juli    16


Expected Result:

    Name    Age
0   tom 10
1   nick    15
2   juli_1  14
3   juli_2  15
4   juli_3  16
martineau
  • 119,623
  • 25
  • 170
  • 301
NiMbuS
  • 87
  • 2
  • 9
  • This will solve your problem https://stackoverflow.com/questions/30650474/python-rename-duplicates-in-list-with-progressive-numbers-without-sorting-list – Abhi Oct 30 '19 at 09:50

4 Answers4

1

If you only want unique ID (Name in this case), you can try this:

data = [['tom', 10], ['nick', 15], ['juli', 14], ['juli', 14], ['juli', 14]] 
df = pd.DataFrame(data, columns = ['Name', 'Age']) 
suffix = df.groupby(df.Name)\
      .cumcount()\
      .astype(str)\
      .str.replace('0', '')\
      .values
df.Name = df.Name + suffix

Output:

    Name    Age
0   tom     10
1   nick    15
2   juli    14
3   juli1   14
4   juli2   14
ExplodingGayFish
  • 2,807
  • 1
  • 5
  • 14
0

You can use a Window-function in combination with a Rank-function to make a new unique ID. See also the following post: SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe

0

try this:

from collections import Counter
keys = [x[0] for x in data]
duplicates = [key for key,value in Counter(keys).items() if value>1]

for i in range(len(duplicates)):
    index = 0
    for j in range(len(data)):
        if data[j][0] == duplicates[i]:
            if index> 0:
                data[j][0] += str(index)
            index +=1



0

Here is what I tried and it worked for me.... I took help and declared a class for renaming duplicate values.

class renamer(): def init(self): self.d = dict()

def __call__(self, x):
    if x not in self.d:
        self.d[x] = 0
        return x
    else:
        self.d[x] += 1
        return "%s_%d" % (x, self.d[x])

and then I just used apply function to the dataframe column.

df['ID'] = df['ID'].apply(renamer())

NiMbuS
  • 87
  • 2
  • 9