1

How do I count each group in a DataFrame then append the group counts into a Summary DataFrame?

I'm very new to Python

I have set up an empty DataFrame

Counts_data=pd.DataFrame(columns=['filename','Green','Stubble','Baresoil','Stones','Shadow'])

I then start a For loop through images Inside the loop I create a DataFrame of RGBgroups.(Results predicted from the pixels RGB by a knn model)

df_img_pred=pd.DataFrame(knn.predict(df_img_data),columns=['RGBgroup'])
print(df_img_pred.head())
Img_counts=df_img_pred.stack().value_counts()

The output is

 RGBgroup
0  BareSoil
1   Stubble
2   Stubble
3   Stubble
4  BareSoil
BareSoil    56507
Stubble     52751
Shadow       5030
Stones       4267
Green         245
dtype: int64

I want to count each group and append the results into the "Counts_data" Dataframe along with the filename of image. I've tried numerous ways of filtering, counting and append but I can't get it to work.

Any assistance would be greatly appreciated.

Sreekiran A R
  • 3,123
  • 2
  • 20
  • 41
K.T
  • 11
  • 3
  • 1
    Provide minimal data (csv) please. See [this guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – knh190 Jan 22 '19 at 06:24
  • How do I attach a file? – K.T Jan 23 '19 at 23:18
  • you don't need to attach a file. Post a minimal text table so that we could know how's your data look like. – knh190 Jan 24 '19 at 02:45
  • RGBgroup 0 Stubble 1 BareSoil 2 Stubble 3 Stubble 4 BareSoil 5 BareSoil 6 BareSoil 7 Stubble 8 BareSoil 9 Stubble 10 Stubble 11 Stubble 12 Stubble 13 BareSoil 14 BareSoil 15 Stubble 16 Stubble 17 Stubble 18 Stubble 19 Stubble Here is the first 2 rows of data. This file only has Stubble and Baresoil, there could also be Green, Stones and Shadow. The total rows is 118800. Thanks – K.T Jan 25 '19 at 01:37
  • can you read this? I can't. – knh190 Jan 25 '19 at 02:48

2 Answers2

2

You can create a dataframe in one go:

# list append is much more efficient
# than operating a dataframe
s = []
for row in stat_df:
    s.append(row)

# create a dataframe
labels = ['file1', 'file2', 'file3']
df = pd.DataFrame(s, columns=labels)

You may replace stat_df with your prediction df, and create labels accordingly.

knh190
  • 2,744
  • 1
  • 16
  • 30
0

You should create a list, appending each new output to it, and when finished convert the list into DataFrame. Growing DataFrame is very costly operation.

If all you need is a simple count, consider using Counter from collections module.

igrinis
  • 12,398
  • 20
  • 45
  • Would you mind including some explanation or references on why growing dataframes is a costly operation? – kerwei Jan 22 '19 at 08:52
  • 1
    It's about memory arrangement. Elements in columns (series) in DF have to be adjacent. So adding a row to DF is actually copying all it content. see [this](https://stackoverflow.com/questions/31690076/creating-large-pandas-dataframes-preallocation-vs-append-vs-concat) for more. – igrinis Jan 22 '19 at 10:30
  • Please show me how to do this, I'm open to any suggestions, the loop will loop through 100's of image files. I'm interested in the end result a summary file containing the all the image files with the counts of the 5 different group. – K.T Jan 23 '19 at 23:05