Market Basket Analysis/Association Rule Mining using R package – arules

In my previous post, i had discussed about Association rule mining in some detail.  Here i have shown the implementation of the concept using open source tool R using the package arules. Market Basket Analysis is a specific application of Association rule mining, where retail transaction baskets are analysed to find the products which are likely to be purchased together. The analysis output forms the input for  recomendation engines/marketing strategies. Association rule mining cannot be done using Base SAS/ Enterprise Guide and hence R seems to be the best option in my opinion.
The arules package has Apriori algorithm which i will be demonstrating here using a sample transaction file called “Transactions_sample.csv”( find below)

R Source Code:

#To set the working directory to folder where source files are placed.(set this to directory as per your needs)
setwd(“C:/Documents and Settings/deepak.babu/Desktop/output”);

#Install the R package arules
install.packages(“arules”);

#load the arules package
library(“arules”);

# read the transaction file as a Transaction class
# file – csv/txt
# format – single/basket (For ‘basket’ format, each line in the transaction data file represents a transaction
#           where the items (item labels) are separated by the characters specified by sep. For ‘single’ format,
#           each line corresponds to a single item, containing at least ids for the transaction and the item. )
# rm.duplicates – TRUE/FALSE
# cols -   For the ‘single’ format, cols is a numeric vector of length two giving the numbers of the columns (fields)
#           with the transaction and item ids, respectively. For the ‘basket’ format, cols can be a numeric scalar
#           giving the number of the column (field) with the transaction ids. If cols = NULL
# sep – “,” for csv, “\t” for tab delimited

txn = read.transactions(file=”Transactions_sample.csv”, rm.duplicates= FALSE, format=”single”,sep=”,”,cols =c(1,2));

# Run the apriori algorithm
basket_rules <- apriori(txn,parameter = list(sup = 0.5, conf = 0.9,target=”rules”));

# Check the generated rules using inspect
inspect(basket_rules);

#If huge number of rules are generated specific rules can read using index
inspect(basket_rules[1]);

 

#############################################################################
##############  SUPPLEMENTARY  INFO  ########################################
#############################################################################
#To visualize the item frequency in txn file

itemFrequencyPlot(txn);

#To see how the transaction file is read into txn variable.
inspect(txn);

Output:


parameter specification:
confidence minval smax arem  aval originalSupport support minlen maxlen target
0.9    0.1    1 none FALSE            TRUE     0.5      1      5  rules
ext
FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE  FALSE TRUE    2    TRUE

apriori – find association rules with the apriori algorithm
version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
set item appearances …[0 item(s)] done [0.00s].
set transactions …[6 item(s), 7 transaction(s)] done [0.00s].
sorting and recoding items … [2 item(s)] done [0.00s].
creating transaction tree … done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing … [1 rule(s)] done [0.00s].
creating S4 object  … done [0.00s].


As we see from the output, Number of rules generated are 1, with support = 50% and confidence = 90%. The generated rules can be checked using inspect(basket_rules) command:
lhs                              rhs            support               confidence     lift
1 {Choclates} => {Pencil}  0.5714286          1                        1.166667

The above rule means “If a chocolate is brought then there is 90% likelihood of purchase of pencil”. The support 0.57 indicates that 57% of the transaction in the data involve chocolate purchases.  The confidence of 90% indicates out of the transactions which involve chocolates, 90% of them also involved purchase of pencils. Hence the support indicates goodness of the choice of rule and confidence indicates the correctness of the rule.

Also we can see the distribution of items within transactions using image(txn) and  itemFrequencyPlot(txn).

Transaction.csv
===========
1001,Choclates
1001,Pencil
1001,Marker
1002,Pencil
1002,Choclates
1003,Pencil
1003,Coke
1003,Eraser
1004,Pencil
1004,Choclates
1004,Cookies
1005,Marker
1006,Pencil
1006,Marker
1007,Pencil
1007,Choclates

Item Frequency Plot

Image(txn) showing density

Image(txn) showing density

About prdeepakbabu
a data mining enthusiast

7 Responses to Market Basket Analysis/Association Rule Mining using R package – arules

  1. Pingback: Association Rule Mining « Next generation BI

  2. Kris says:

    Excellent blog on MBA( Market basket analysis) …. My long standing prob in using arules ended after landing this page … excellent written .. thanks bro ….
    btw, is there any way i can see the transactions read visually?

  3. Jayakrishna says:

    Excellent one and very useful. Thanks a ton Deepak… Koodos to you,

  4. priyadarshee says:

    This was good, but in arules package there are around 47 functions, please show us there use and different ways of visualization(like through trees or making some movie using animation package like changing of frequent item set w.r.t change in confidence level or something).

    • prdeepakbabu says:

      True, as far as representation is concerned, i dont know if there are methods available in arules package(except for representing the transaction sparsity matrix). With preliminary research i couldn find any such method in arules package, will have to find ways out of this package, i will try to find some info on this.

  5. Alpha says:

    I am developing the Asssoiation rules using oracle API. I need to know ho can i calculate the Lift if the assosiation rules. Can you provide any sample help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 46 other followers