0

Sort data frame by values of 5th column ["Card"] and add a new row after each card nos with its count. After sorting values how can I add a new row with Total:

Dataframe looks something like this

enter image description here

This is how I want output data frame

enter image description here

Adataguy
  • 25
  • 6

2 Answers2

1

You can give this a try:

import pandas as pd

# create dummy df
card = ["2222","2222","1111","2222","1111","3333"]
name = ["Ed", "Ed",  "John", "Ed", "John", "Kevin"]
phone = ["1##-###-####", "1##-###-####", "2##-###-####", "1##-###-####", "2##-###-####", "3##-###-####"]
df = pd.DataFrame({"Name":name, "Phone":phone, "Card":card})

# sort by Card value
df = df.sort_values(by=["Card"]).reset_index(drop=True)

# Groupby the Card value, count them, then insert a new row based on that count
index = 0
line = []
for x in df.groupby("Card").size():
    index += x
    line.append(pd.DataFrame({"Name": "", "Phone":"", "Card": str(x)}, index=[index]))
df = df.append(line, ignore_index=False)
df = df.sort_values(by=["Card"]).sort_index().reset_index(drop=True)
df

Output:

    Name    Phone           Card
0   Ed      1##-###-####    1111
1   Ed      1##-###-####    1111
2   Ed      1##-###-####    1111
3                              3
4   John    2##-###-####    2222
5   John    2##-###-####    2222
6                              2
7   Kevin   3##-###-####    3333
8                              1

Edit ~~~~

Due to OP's use of string for card numbers, an edit had to be made to account for naturally sorting string ints

import pandas as pd
from natsort import natsort_keygen ##### Now needed because OP has Card numbers as strings

# create dummy df ##############
card = ["1111", "2222", "3333", "4444", "5555", "6666", "7777", "8888"]
name = ["Ed", "John", "Jake", "Mike", "Liz", "Anne", "Deb", "Steph"]
phone = ["1###", "2###", "3###", "4###", "5###", "6###", "7###", "8###"]
dfList = [a for a in zip(name, phone, card)]
dfList = [dfList[random.randrange(len(dfList))] for i in range(50)]
df = pd.DataFrame(dfList, columns=["Name", "Phone", "Card"])
################################

# sort by Card value
df = df.sort_values(by=["Card"]).reset_index(drop=True)

# Groupby the Card value, count them, then insert a new row based on that count
index = 0
line = []
for x in df.groupby("Card").size():
    index += x
    line.append(pd.DataFrame({"Name": "", "Phone":"", "Card": str(x)}, index=[index-1]))
df = pd.concat([df, pd.concat(line)], ignore_index=False)

# Create an Index column to be used in the by pandas sort_values
df["Index"] = df.index

# Sort the values first by index then by card number, use "natsort_keygen()" to naturally sort ints that are strings
df = df.sort_values(by = ['Index', 'Card'], key=natsort_keygen(), ascending = [True, False]).reset_index(drop=True).drop(["Index"], axis=1)
Michael S.
  • 3,050
  • 4
  • 19
  • 34
  • 1
    It is advisable to never pd.concat nor pd.Dataframe.append inside of a loop, it leads to [quadratic copying](https://stackoverflow.com/a/36489724/6361531) hence the bigger the dataframe is the slower the operation becomes expontentially. Instead create an empty list, and append dataframes to that list after looping then you pd.concat that list of dataframes. – Scott Boston Jul 12 '22 at 12:49
  • I edited it for string values (by adding "str(x)") – Michael S. Jul 12 '22 at 12:55
  • Thank you @[Scott Boston](https://stackoverflow.com/users/6361531/scott-boston), I've updated my answer with an empty list – Michael S. Jul 12 '22 at 13:05
  • @MichaelS. For short number of rows it's working properly but for multiple rows it is not properly grouping as per requirement – Adataguy Jul 18 '22 at 06:07
  • @Ajinkz Check the edit, it was failing because your Card values are strings, not ints, and thus wasn't naturally sorting them. The edit now accounts for that (you'll have to install natsort) – Michael S. Jul 18 '22 at 13:02
0

I'm not sure if this is the best way but it worked for me.


list_of_df = [v for k, v in df.groupby("card")]
for i in enumerate(list_of_df):
    list_of_df[i[0]] = list_of_df[i[0]].append({"card":str(list_of_df[i[0]].shape[0])}, ignore_index=True)

final_df = pd.concat(list_of_df)

Adataguy
  • 25
  • 6