below is the sample table/Data frame. The third attribute (count) will give the count of similar rows(attribute1+attribute2)
╔════╦═════════════╦═════════════╦══════════════════════════════╗
║ ID ║ Attribute 1 ║ Attribute 2 ║ count(Attribute1+Attribute2) ║
╠════╬═════════════╬═════════════╬══════════════════════════════╣
║ 1 ║ A ║ AA ║ 3 ║
║ 2 ║ B ║ CC ║ 1 ║
║ 3 ║ C ║ BB ║ 2 ║
║ 4 ║ A ║ AA ║ 3 ║
║ 5 ║ C ║ BB ║ 2 ║
║ 6 ║ D ║ AA ║ 1 ║
║ 7 ║ B ║ AA ║ 1 ║
║ 8 ║ C ║ DD ║ 1 ║
║ 9 ║ A ║ AB ║ 1 ║
║ 10 ║ A ║ AA ║ 3 ║
╚════╩═════════════╩═════════════╩══════════════════════════════╝
Update :
Thanks akrun
and danas.zuokas
for the help.
the final output I am expecting would look something like this. where I am choosing 50% from each count group .ex : for ID 1,4,10 the count is 3. I would need to choose only 2 (50%) for each count group hence I should get (A,AA) twice .
ID Attribute 1 Attribute 2 count(Attribute1+Attribute2)
1 A AA 3
2 B CC 1
3 C BB 2
4 A AA 3
6 D AA 1
7 B AA 1
8 C DD 1
9 A AB 1