1

I am using Python for market basket analysis. When I am executing this code, it only showing the column name without any result.

frequent_tr = apriori(data_tr, min_support=0.05)

enter image description here

Here is the dataset Removed

I have adjusted the min_support value but still showing the same result.

The library that I have used is

import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules 
from mlxtend.preprocessing import TransactionEncoder

Then the following code is executed.

data = pd.read_csv(_link of the csv location_)

data_tr = data.groupby(['transaction_id', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('transaction_id').droplevel(0,1)

TE = TransactionEncoder()
array = TE.fit(data_tr).transform(data_tr)
data_tr_encoded = pd.DataFrame(array, columns = TE.columns_)
frequent_tr_encoded = apriori(data_tr, min_support=0.05)

The final code result column name only.

I am expecting the final code will print the result like this: expected result

EDIT

I have updated my code to display each of the service_type into column (refer to data_tr code above)

The output does not show each service_type Edited output

Azul
  • 25
  • 8
  • Unless `apriori` is a new built-in function in `python`, you would need to post it here if you want other people to see what it does. – Aryerez Aug 23 '21 at 11:03
  • @Aryerez. I have included the library that I have used before execute any code. So, the apriori algorithm is not self built, it is part of the Python library. – Azul Aug 23 '21 at 11:11
  • https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/#overview the `apriori` function takes a dataframe as a param, not a csv file – Craicerjack Aug 23 '21 at 11:13
  • Yup @Craicerjack I read the data using data = pd.read_csv , after importing the mentioned library in my post. Thus, the data is in dataframe format – Azul Aug 23 '21 at 11:16
  • You should include all relevant code in your question – Craicerjack Aug 23 '21 at 11:16
  • Please see http://sscce.org/ when asking others for programming help on the internet. – orlp Aug 23 '21 at 11:17
  • @Craicerjack, ok, updated all the library and code that I have executed until I get the error message. – Azul Aug 23 '21 at 11:23
  • what result are you expecting? I think you will need to adjust your `data_tr` variable as your coulmn names are just letters of the alphabet and not the items you are referring to – Craicerjack Aug 23 '21 at 11:24
  • @Craicerjack, updated a screenshot of my expected result. – Azul Aug 23 '21 at 11:28
  • @Craicerjack I have make the service_type into column but the output after apply TransactionaEncoder is not quite right. Refer to the Edit part. – Azul Aug 23 '21 at 11:45
  • @RinshanKolayil Can you post back the earlier message regarding making data_tr_list ? – Azul Aug 23 '21 at 13:20
  • @Azul `data_tr_list = data_tr.values.tolist()` , But it will not give you the expected output – Rinshan Kolayil Aug 23 '21 at 13:21

2 Answers2

0
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori
from mlxtend.preprocessing import TransactionEncoder
data = pd.read_csv('Dataset - Transaction.csv')
data_tr = data.groupby(['geohash_user', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('geohash_user').droplevel(0,1)
data_tr_list = pd.DataFrame(np.where(np.array(data_tr.values.tolist()) >= 1, 1,0),columns=data_tr.columns)
frequent_tr_encoded = apriori(data_tr_list, min_support=0.05,use_colnames=True)

Output (Same output of Mr.CarlosSR)

support itemsets
0   0.054093    (Aircond Repair)
1   0.186669    (Aircond Servicing)
2   0.090622    (Electrical Wiring / Power Point)
3   0.078008    (Local Moving - Budget Lorry)
4   0.060556    (Painting)
5   0.170405    (Plumbing Repair)
6   0.054093    (Aircond Repair)
7   0.186669    (Aircond Servicing)
8   0.090622    (Electrical Wiring / Power Point)
9   0.078008    (Local Moving - Budget Lorry)
10  0.060556    (Painting)
11  0.170405    (Plumbing Repair)
12  0.054093    (Aircond Repair)
13  0.186669    (Aircond Servicing)
14  0.090622    (Electrical Wiring / Power Point)
15  0.078008    (Local Moving - Budget Lorry)
16  0.060556    (Painting)
17  0.170405    (Plumbing Repair)

Second

data_tr = data.groupby(['transaction_id', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('transaction_id').droplevel(0,1)
data_tr_list = pd.DataFrame(np.where(np.array(data_tr.values.tolist()) >= 1, 1,0),columns=data_tr.columns)
frequent_tr_encoded = apriori(data_tr_list, min_support=0.05,use_colnames=True)

Output

    support     itemsets
0   0.131081    (Aircond Servicing)
1   0.058486    (Electrical Wiring / Power Point)
2   0.050062    (Local Moving - Budget Lorry)
3   0.114593    (Plumbing Repair)

EDIT

The allowed values for a DataFrame by apriori function are True, False, 0, 1

Filtering SUM more than 2 along axis 1

data_tr_list = data_tr_list[data_tr_list.sum(axis=1) >= 2]
Rinshan Kolayil
  • 1,111
  • 1
  • 9
  • 14
  • both of them does not apply TransactionEncoder. Without it, seems like no problem on result. – Azul Aug 23 '21 at 16:58
  • Because the each row of the data have same shape,Please refer https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/#example-1 – Rinshan Kolayil Aug 23 '21 at 17:34
  • thanks friend. TransactionEncoder will give boolean value and unique label name. Thus, I have printed out the shape of the 3 tables. In the end, I still need to go back to the unstack dataframe when applying the algorithm. Is this what you have discovered at this moment? Even I have used the encoder with Boolean as value but I still need to filter transaction_id with more than 1 items. So, I will need to convert to numerical value. Does this mean the encoder is useless in my final result? – Azul Aug 23 '21 at 23:33
  • I don't think it is neccessary to go back to unstack the dataframe. Mr CarloSr already done the filtering more than two items. Encoder is not useless, you can convert boolean value to integer value where True becomes 1 and False becomes 0 – Rinshan Kolayil Aug 24 '21 at 05:05
  • Please refer, https://stackoverflow.com/questions/58435648/applying-transaction-encoder-on-dataset, https://www.geeksforgeeks.org/apriori-algorithm/ and https://www.geeksforgeeks.org/implementing-apriori-algorithm-in-python/ – Rinshan Kolayil Aug 24 '21 at 05:07
0

Your csv has 142.155 rows and 142.103 unique transaction_id. That means that that only 52 of your transaction_id have more than one service_type... how do you intend to apply an apriori model with only 52 associations? Could it be that you are intending to do an apriori not based in the transaction level but on the geohash_user level?

Beside of that, and assuming you want to go with the user level analysis, quite not sure why you need to use TransformationEncoder.

I guess that what you are trying to achieve is your dataframe to have a 1 (True) if the value is higher than 0 and 0 (False) otherwise. At least, for using apriori that is what you are expected to use as input, because it doesn't mind whether in the same transaction there were 1 or 5 units of the same type.

def encode_units(x):
    if x<=0:
        return 0
    elif x >= 1:
        return 1


data = pd.read_csv('Dataset - Transaction.csv')

data_tr = data.groupby(['geohash_user', 'service_type']).sum().unstack().reset_index().fillna(0).set_index('geohash_user').droplevel(0,1)

data_tr_encoded2 = data_tr.applymap(encode_units)

data_tr_encoded_filt = data_tr_encoded2[(data_tr_encoded2 > 0).sum(axis=1) >= 2] #we only need users that have more than 1 service in order to get association rules

frequent_tr_encoded = apriori(data_tr_encoded_filt, min_support=0.05, use_colnames = True)
    support itemsets
0   0.054093    (Aircond Repair)
1   0.186669    (Aircond Servicing)
2   0.090622    (Electrical Wiring / Power Point)
3   0.078008    (Local Moving - Budget Lorry)
4   0.060556    (Painting)
5   0.170405    (Plumbing Repair)
6   0.054093    (Aircond Repair)
7   0.186669    (Aircond Servicing)
8   0.090622    (Electrical Wiring / Power Point)
9   0.078008    (Local Moving - Budget Lorry)
10  0.060556    (Painting)
11  0.170405    (Plumbing Repair)
12  0.054093    (Aircond Repair)
13  0.186669    (Aircond Servicing)
14  0.090622    (Electrical Wiring / Power Point)
15  0.078008    (Local Moving - Budget Lorry)
16  0.060556    (Painting)
17  0.170405    (Plumbing Repair)
CarlosSR
  • 1,145
  • 1
  • 10
  • 22
  • I intend to use transaction_id when applying apriori just to let the model to get some knowledge about what item can buy together. Based on my opinion, we should do transaction_id and also geohash_user to get association rule then check whether the rule are logic or not. This is my opinion, do correct me if I am wrong. Regarding the number of unique data to apply apriori, even though the uniqueness is small, we should be able to find out the association rule as long as the data applied has purchase more than 1 item at the same time. Correct me again if I am wrong. – Azul Aug 23 '21 at 14:10
  • I have read though Medium and also Towards Data Science posts, some of them will use TransactionEncoder as their preprocessing, while some does not. May I know when do we need to apply TransactionEncoder? There is no explanation in online. – Azul Aug 23 '21 at 14:11