I have a data set where I have a few thousand records. The columns are acct number, code, data.
I need to sort the entire rows based on the code(column 2) if they have the same acct number(column1). so something like this.
df = DataFrame({'Acct Number':[1011,1011,1011,1011,1012,1012,1012,1012,1012],
'Code':['GHI','JKL','ABC','DEF','DEF','MNO','JKL','GHI','ABC'],
'Data':['text3','text4','text1','text2','text2','text5','text4','text3','text1'])
I need to know how to make a dataframe that looks like this:
df = DataFrame({'Acct Number':[1011,1011,1011,1011,1012,1012,1012,1012,1012],
'Code':['ABC','DEF','GHI','JKL','ABC','DEF','GHI','JKL','MNO'],
'Data':['text1','text2','text3','text4','text1','text2','text3','text4','text5'])
THE CODE(column2) IS NOT IN ALPHABETICAL ORDER - They are random letters so I cannot use a basic sort to get those rows in order.
I am trying to look into groupby() with a set_index() Like in this question:
Question #66650446 - Pandas Groupby based on multiple columns
I am looking to try something like this.
import pandas as pd
THEFILE = 'original csv file'
THENEWFILE = 'the new csv file'
df = pd.read_csv('THEFILE')
df = df.set_index(['acct_number',df.groupby('code':'CFV','VFC','GTF','EDD','TGY'])
df = df.to_csv(THENEWFILE)
I am not sure the syntax in a groupby() to identify the specific order I want if it is set within an index that already sorts by a different column first.
All help is greatly appreciated!!
UPDATE: as of 3:28PM pacific time on 07/02/21 I found this question: