0

I'm trying to create new column with the count of each attribute (Number) but it's given me the total count of row as shown below

POM = pd.read_excel('POM.xlsx' , sheet_name='Sheet1')
POM['Merged']=POM['ACD ID'].astype(str)+PG['Date'].astype(str)
POM['Count']=POM['Merged'].count()
print(POM.head(2))
POM.to_excel('Test.xlsx' , index=False)

Result

        Date  ACD ID           Merged  Count
0 2022-03-08   14004  140042022-03-08    986
1 2022-03-09   14004  140042022-03-09    986

How can I count the unique code which is repetition found in column (Merged)?

# Date              ACD ID     Merged     Count
2022-03-08 00:00:00 14004   140042022-03-08 10
2022-03-09 00:00:00 14004   140042022-03-09 49
2022-03-10 00:00:00 14004   140042022-03-10 62

i have try the bellow code as answered & i found accurate answer but in same time the Merged New coulmn that i created by group(date+ ACD ID) are not accuaret as shown below , that date in ( merged ) not matched the date

POM = pd.read_excel('POM.xlsx' , sheet_name='Sheet1')
POM['Merged']=POM['ACD ID'].astype(str)+PG['Date'].astype(str)
#POM['Count'] = POM['Merged'].count()
POM['Count'] = POM['Merged'].copy().replace(POM['Merged'].value_counts().to_dict())
#POM['Count']=POM.groupby(["Merged"])["Merged"].transform("count")
#POM['counts'] = POM['Merged'].map(POM['Merged'].value_counts())

print(POM.head(6))
#POM.to_excel('Test.xlsx' , index=False)

result

        Date  ACD ID           Merged  Count
0 2022-03-08   14004  140042022-03-08      1
1 2022-03-08   14005  140052022-03-09      1
2 2022-03-08   14006  140062022-03-10      1
3 2022-03-08   14007  140072022-03-15      1
  • Please share a sample of your input df with expected output. – Mayank Porwal May 18 '22 at 16:23
  • input is $date { 2022-03-08,2022-03-09,2022-03-10} and $ User {1411,1411,1411} as i merged both to get unique value as i have rep in data as agent can do many action in one day . i need to get the count of rep data – Mohamed Khamees May 18 '22 at 19:24
  • 1
    In my sample the merging happened between two columns in the same DataFrame, in yours you use `POM` and `PG`. Are these two DataFrames with the same number of rows and both in the right (and same) order? – ewz93 May 18 '22 at 20:03
  • ohh sorry i didn't notice , problem solved now , many thanks – Mohamed Khamees May 18 '22 at 20:12

2 Answers2

1

Try groupby() and then transform("count"):

import pandas

df = pandas.DataFrame({"date": ["2022", "2023", "2023", "2024"],
                       "id": ["a", "a", "a", "c"]})
df["merged"] = df["date"] + "_" + df["id"]
print(df)
>    date id  merged
  0  2022  a  2022_a
  1  2023  a  2023_a
  2  2023  a  2023_a
  3  2024  c  2024_c

df["counts"] = df.groupby(["merged"])["merged"].transform("count")
print(df)
>    date id  merged  counts
  0  2022  a  2022_a       1
  1  2023  a  2023_a       2
  2  2023  a  2023_a       2
  3  2024  c  2024_c       1

Check out this thread.

ewz93
  • 2,444
  • 1
  • 4
  • 12
  • as count is working fine i guess it's counting as Merged column but as below answer too the Merged column is modifying the date by it self as it keep adding date + 1 in the merged new column while in date it's the same – Mohamed Khamees May 18 '22 at 19:22
1

You could do something like this:

df['Count'] = df['Merged'].copy().replace(df['Merged'].value_counts().to_dict())

You create a copy of Merged, then count the values in Merged and convert it to a dict of {id: count}, then use that dictionary to replace the id values in the copied Merged column.

This has the added benefit over groupby that you can retain your original indexing and shape of the dataframe while still filling in the counts of each id.

Ex:

df = pd.DataFrame({"date": ["2022", "2023", "2023", "2024"],
                       "id": ["a", "a", "a", "c"]})
df["merged"] = df["date"] + "_" + df["id"]
df['Count'] = df['merged'].copy().replace(df['merged'].value_counts().to_dict())

   date id  merged  Count
0  2022  a  2022_a      1
1  2023  a  2023_a      2
2  2023  a  2023_a      2
3  2024  c  2024_c      1
whege
  • 1,391
  • 1
  • 5
  • 13
  • Many Thanks for the comment , i tried that & i guess the count code working fine , but i have found that ( Merged ) is not accurate as it keep changing the date from it self while no change in the source Date ACD ID Merged Count 0 2022-03-08 14004 140042022-03-08 1 1 2022-03-08 14004 140042022-03-09 1 2 2022-03-08 14004 140042022-03-10 1 3 2022-03-08 14004 140042022-03-15 1 – Mohamed Khamees May 18 '22 at 19:16