Adding and iterating new rows to empty dataframe with pandas for clustering purposes

Question

I'm currently working on big chunk of data on clustering each of RFM_class. The rfm class have 125 distinct values ranging from 111 until 555, the total rows of my dataframe currently sampled into 10000 rows for trial purposes of the script.

The logic behind what i'm trying to do is, take each of the RFM_class (125 distinct values), and do clustering method for each subset of the RFM_class by looping them for each RFM_class to get the cluster_class column with an empty dataframe, and then append the value again to the empty dataframe. And the empty dataframe will be merged to my main table. This is the snapshot of the main table, I shrinked into 4 columns only, the origin is 11 columns.

df_test
RFM_class  customer_id   num_orders recent_day  amount_order   
555            1               1489       0        18539000  
555            2                 72       3         1069000
145            3                 13     591         1350000
555            4                208       0         2119000
445            5                 40       9          698000

What i'm doing is not far enough until the clustering, so i'm really stuck in looping each of the RFM_class This is what i'm trying to do for the last couple of days , trying only to take each RFM_class

rfm_list = list(set(df_test['rfm']))
core_col = ['num_orders','recent_day','amount_order']
cl_class = []

for row in rfm_list:    
    a=pd.DataFrame(df_test[core_col][df_test.rfm==row],columns=core_col)
    cl_class.append(a)

cl_class

but the result is not as expected, because doing append seems not adding a new rows inside my empty dataframe. Are there any function to do this on pandas? currently using python 3.0

What empty dataframe are you talking about? What is the expected result? — Stop harming Monica, Apr 29 '18 at 15:52
@Goyo sorry for the lack of clarity, i want to do clustering on this dataset. Expected to have a new column containing the clustering class for each `RFM_class` . So what i was doing was creating a new empty dataframe for clustering calculation. But i wasn't able to do iteration of clustering in each `RFM_class` — user3292755, Apr 29 '18 at 16:36
For clustering, Have you tried groupby function in pandas ? eg: https://stackoverflow.com/questions/39922986/pandas-group-by-and-sum — cypher, Apr 29 '18 at 17:01
@cypher thanks! almost got it, tried this one `df_test.groupby(['rfm_class']).sum()` But i should replace `sum` function, any idea what function i should make to replace `sum()` ? sorry i'm pretty new to python, bt really open on new sources to learn — user3292755, Apr 29 '18 at 17:19

score 1 · Accepted Answer · answered Apr 29 '18 at 18:07

You can use groupby to cluster the values. For eg: consider this example csv file, where you would like to group by column fruits:

Fruit,Date,Name,Number
Apples,10/6/2016,Bob,7
Apples,10/6/2016,Bob,8
Apples,10/6/2016,Mike,9
Apples,10/7/2016,Steve,10
Apples,10/7/2016,Bob,1
Oranges,10/7/2016,Bob,2
Oranges,10/6/2016,Tom,15
Oranges,10/6/2016,Mike,57
Oranges,10/6/2016,Bob,65
Oranges,10/7/2016,Tony,1
Grapes,10/7/2016,Bob,1
Grapes,10/7/2016,Tom,87
Grapes,10/7/2016,Bob,22
Grapes,10/7/2016,Bob,12
Grapes,10/7/2016,Tony,15

The sample code to iterate through clusters:

import pandas as pd;
df = pd.read_csv("filename.csv");
grouped = df.groupby("Fruit");
for name, group in grouped:
    print(name);

Hope this helps.

got it, the group by function is a perfect start, thanks @cypher — user3292755, Apr 30 '18 at 00:16

Adding and iterating new rows to empty dataframe with pandas for clustering purposes

1 Answers1