This is a pretty basic question. I want to organize a csv a certain way. So far I have it almost as I want it, and it is pretty fast as well.
import numpy as np
import pandas as pd
import pandas.util.testing as tm; tm.N = 3
data = pd.DataFrame.from_csv('MYDATA.csv')
byqualityissue = data.groupby(["Name","Type"]).size()
df = pd.DataFrame(bytype)
formatted = df.unstack(level=-1)
formatted[np.isnan(formatted)] = 0
formatted.to_csv('bytype.csv', header=True)
bytype looks like this:
Type Type1 Type2 Type3
Name
Company 1 10 0 3
Company 2 4 23 3
Company 3 3 2 0
However, I want it sorted so that the company with the largest total (sum of the row) will be at the top. In this case it should be Company 2 at the top, followed by Company 1, then Company 3. It should select only the top twenty customers to be read in by the rest of my python code.
Name, Type1, Type2, Type3
Company 2, 4, 23, 3
Company 1, 10, 0, 3
Company 3, 3, 2, 0
I think the best way would be to add up the totals in another column and then sort by descending order. In order to do that I wrote this line:
sorted = np.sum(formatted, axis=1)
which successfully outputs a total. I looked at How to add a new column to a CSV file using Python?
And so wrote:
with open('bytype.csv') as csvinput:
with open('bytype2.csv','w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Total')
all.append(row)
for row in reader:
row.append(np.sum(formatted, axis=1))
all.append(row)
writer.writerows(all)
But not only am I not getting an output bytype2, but that method seemed to mean re-reading the csv and then appending the column, and THEN I'd still have to sort them. There must be an easier way to do this that I'm missing.