Market Basket Analysis/Association Rule Mining using R package – arules
November 13, 2010 7 Comments
In my previous post, i had discussed about Association rule mining in some detail. Here i have shown the implementation of the concept using open source tool R using the package arules. Market Basket Analysis is a specific application of Association rule mining, where retail transaction baskets are analysed to find the products which are likely to be purchased together. The analysis output forms the input for recomendation engines/marketing strategies. Association rule mining cannot be done using Base SAS/ Enterprise Guide and hence R seems to be the best option in my opinion.
The arules package has Apriori algorithm which i will be demonstrating here using a sample transaction file called “Transactions_sample.csv”( find below)
R Source Code:
#To set the working directory to folder where source files are placed.(set this to directory as per your needs)
setwd(“C:/Documents and Settings/deepak.babu/Desktop/output”);
#Install the R package arules
install.packages(“arules”);
#load the arules package
library(“arules”);
# read the transaction file as a Transaction class
# file – csv/txt
# format – single/basket (For ‘basket’ format, each line in the transaction data file represents a transaction
# where the items (item labels) are separated by the characters specified by sep. For ‘single’ format,
# each line corresponds to a single item, containing at least ids for the transaction and the item. )
# rm.duplicates – TRUE/FALSE
# cols - For the ‘single’ format, cols is a numeric vector of length two giving the numbers of the columns (fields)
# with the transaction and item ids, respectively. For the ‘basket’ format, cols can be a numeric scalar
# giving the number of the column (field) with the transaction ids. If cols = NULL
# sep – “,” for csv, “\t” for tab delimited
txn = read.transactions(file=”Transactions_sample.csv”, rm.duplicates= FALSE, format=”single”,sep=”,”,cols =c(1,2));
# Run the apriori algorithm
basket_rules <- apriori(txn,parameter = list(sup = 0.5, conf = 0.9,target=”rules”));
# Check the generated rules using inspect
inspect(basket_rules);
#If huge number of rules are generated specific rules can read using index
inspect(basket_rules[1]);
#############################################################################
############## SUPPLEMENTARY INFO ########################################
#############################################################################
#To visualize the item frequency in txn file
itemFrequencyPlot(txn);
#To see how the transaction file is read into txn variable.
inspect(txn);
Output:
parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target
0.9 0.1 1 none FALSE TRUE 0.5 1 5 rules
ext
FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori – find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances …[0 item(s)] done [0.00s].
set transactions …[6 item(s), 7 transaction(s)] done [0.00s].
sorting and recoding items … [2 item(s)] done [0.00s].
creating transaction tree … done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing … [1 rule(s)] done [0.00s].
creating S4 object … done [0.00s].
As we see from the output, Number of rules generated are 1, with support = 50% and confidence = 90%. The generated rules can be checked using inspect(basket_rules) command:
lhs rhs support confidence lift
1 {Choclates} => {Pencil} 0.5714286 1 1.166667
The above rule means “If a chocolate is brought then there is 90% likelihood of purchase of pencil”. The support 0.57 indicates that 57% of the transaction in the data involve chocolate purchases. The confidence of 90% indicates out of the transactions which involve chocolates, 90% of them also involved purchase of pencils. Hence the support indicates goodness of the choice of rule and confidence indicates the correctness of the rule.
Also we can see the distribution of items within transactions using image(txn) and itemFrequencyPlot(txn).
Transaction.csv
===========
1001,Choclates
1001,Pencil
1001,Marker
1002,Pencil
1002,Choclates
1003,Pencil
1003,Coke
1003,Eraser
1004,Pencil
1004,Choclates
1004,Cookies
1005,Marker
1006,Pencil
1006,Marker
1007,Pencil
1007,Choclates












Pingback: Association Rule Mining « Next generation BI
Excellent blog on MBA( Market basket analysis) …. My long standing prob in using arules ended after landing this page … excellent written .. thanks bro ….
btw, is there any way i can see the transactions read visually?
Nice blog !
Excellent one and very useful. Thanks a ton Deepak… Koodos to you,
This was good, but in arules package there are around 47 functions, please show us there use and different ways of visualization(like through trees or making some movie using animation package like changing of frequent item set w.r.t change in confidence level or something).
True, as far as representation is concerned, i dont know if there are methods available in arules package(except for representing the transaction sparsity matrix). With preliminary research i couldn find any such method in arules package, will have to find ways out of this package, i will try to find some info on this.
I am developing the Asssoiation rules using oracle API. I need to know ho can i calculate the Lift if the assosiation rules. Can you provide any sample help.