I have a problem where I have to find the 'DETAILS' that appear the most number of times(count of 'ID') for a particular 'UNIT' and 'FUNCTION'
so in example below for UNIT = 0011, FUNCTION = 0001, most number of 'ID' are associated with DETAILS = 546545153113131. similiarly - for UNIT = 0011, FUNCTION = 0026, most number of 'ID' are associated with DETAILS = 546545153113132. for UNIT = 0012, FUNCTION = 0001, most number of 'ID' are associated with DETAILS = 546545153113131. for UNIT = 0012, FUNCTION = 0026, most number of 'ID' are associated with DETAILS = 546545153113132.
Dataframe here is quite large containing 5 Million rows. I am reading this data from a text file in a dataframe.
import pandas as pd
# initialize list of lists
data = [['000000000001', '0011','0001','546545153113131'],
['000000000002', '0011','0026','546545153113132'],
['000000000004', '0011','0001','546545153113133'],
['000000000005', '0011','0001','546545153113131'],
['000000000006', '0012','0001','546545153113131'],
['000000000007', '0012','0026','546545153113132'],
['000000000009', '0012','0001','546545153113133'],
['000000000010', '0012','0001','546545153113131']
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['ID', 'UNIT', 'FUNCTION', 'DETAILS'])
df.sort_values(by=['ID', 'UNIT', 'FUNCTION'], ascending=[True,True,True])
df.groupby(['UNIT', 'FUNCTION','DETAILS']).count()
above piece of code gave me this-
ID
UNIT FUNCTION DETAILS
0011 0001 546545153113131 2
546545153113133 1
0026 546545153113132 1
0012 0001 546545153113131 2
546545153113133 1
0026 546545153113132 1
what I am expecting is dataframe that can give me this -
UNIT FUNCTION DETAILS ID
0011 0001 546545153113131 2
0011 0026 546545153113132 1
0012 0001 546545153113131 2
0012 0026 546545153113132 1
I tried few solutions mentioned - GroupBy pandas DataFrame and select most common value , didn't worked. I am a Python beginner and apologies if what I had written does not makes much sense.