<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Next generation BI</title>
	<atom:link href="http://prdeepakbabu.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://prdeepakbabu.wordpress.com</link>
	<description>Just another WordPress.com weblog</description>
	<lastBuildDate>Tue, 25 Oct 2011 01:53:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='prdeepakbabu.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Next generation BI</title>
		<link>http://prdeepakbabu.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://prdeepakbabu.wordpress.com/osd.xml" title="Next generation BI" />
	<atom:link rel='hub' href='http://prdeepakbabu.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Location Intelligence using Social Media Data: Customer Location Aware Systems</title>
		<link>http://prdeepakbabu.wordpress.com/2011/10/25/location-intelligence-using-social-media-data-customer-location-aware-systems/</link>
		<comments>http://prdeepakbabu.wordpress.com/2011/10/25/location-intelligence-using-social-media-data-customer-location-aware-systems/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 01:53:43 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[customer loyalty]]></category>
		<category><![CDATA[drug adherence]]></category>
		<category><![CDATA[foursquare]]></category>
		<category><![CDATA[location aware]]></category>
		<category><![CDATA[location based services]]></category>
		<category><![CDATA[location intelligence]]></category>
		<category><![CDATA[offer recommendation]]></category>
		<category><![CDATA[product recommendation]]></category>
		<category><![CDATA[real time targeting]]></category>
		<category><![CDATA[social media]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=626</guid>
		<description><![CDATA[In the recent past, a variety of new social media sites have emerged &#8211; location Based Services( like foursquare, gowala), Group Deals( like groupon),  microblogging( like twitter, fb). These social media sites have provided integration features with other SM sites, for instance A foursquare checkin can be configured to automatically publish a tweet with the URL of location [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=626&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In the recent past, a variety of new social media sites have emerged &#8211; location Based Services( like foursquare, gowala), Group Deals( like groupon),  microblogging( like twitter, fb). These social media sites have provided integration features with other SM sites, for instance A foursquare checkin can be configured to automatically publish a tweet with the URL of location checked-in. All these data being primarily open source, we have various business opportunities to leverage the data by integrating this with the internal customer data. <br />
             A retailer&#8217;s major concern is the need to understand their customers better, to gain the 360 degree view of the customer. Most of the companies, have strategy to integrate the internal customer behavior data acorss POS, Ecomm, Mail order,etc. By leveraging the social media data and integrating with location based services(like foursquare) and microblogging services(twitter), the retailers now have the ability to track customers.<br />
           When a customer checks-in to a location adjacent to the store, the retailer could idetify this event of the customer, and target in real time with offers/product recommmendation which would likely bring him to the store, increase footfalls and sales.  The channel could be either sms, email or voicemail. The offer could be linked to his/her overall sentiment score by analyzing the customer tweets over time. Meaning, a dissatisfied customer could be given a high value offer as compared to neutral/satisfied customer.<br />
            Pharmacy retail stores could leverage this platform for refill reminders, so to increase drug adherence. When a customer is identified to be closer to the store, a message could go out via channels namely email, sms, voicemail, etc. This increases good will and they feel being cared about, which intun increases loyalty.<br />
          The online world with rise of social media sites has raised concerns about user privacy. I feel the users should be well educated about their privacy settings in these social media sites, and any of the targeting should be done only based on the users approval. We need to instill the message in customers &#8221;The more the data you share with us, the better is the service you recieve&#8221;.<br />
Shown below is a 5 slider deck which talks about this idea of &#8220;Location Intelligence using Social Media data: Customer Location Aware Systems&#8221; which won Social Media Analytics event at an Analytics expo.</p>
<iframe class="scribd_iframe_embed" src="http://www.scribd.com/embeds/70128345/content?start_page=1&view_mode=list&access_key=key-ma7wdi8n01u71raow3w" data-auto-height="true" scrolling="no" id="scribd_70128345" width="100%" height="500" frameborder="0"></iframe>
<div style="font-size:10px;text-align:center;width:100%"><a href="http://www.scribd.com/doc/70128345">View this document on Scribd</a></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/626/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=626&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2011/10/25/location-intelligence-using-social-media-data-customer-location-aware-systems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>
	</item>
		<item>
		<title>Did use of helmets reduce deaths due to 2 wheeler accidents?</title>
		<link>http://prdeepakbabu.wordpress.com/2011/10/13/did-use-of-helmets-reduce-deaths-due-to-2-wheeler-accidents/</link>
		<comments>http://prdeepakbabu.wordpress.com/2011/10/13/did-use-of-helmets-reduce-deaths-due-to-2-wheeler-accidents/#comments</comments>
		<pubDate>Wed, 12 Oct 2011 19:21:52 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[2 wheelers]]></category>
		<category><![CDATA[2007]]></category>
		<category><![CDATA[accidental deaths & suicides in india]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[fine]]></category>
		<category><![CDATA[helmet]]></category>
		<category><![CDATA[helmet rule]]></category>
		<category><![CDATA[Indian Motor Vehicles Act]]></category>
		<category><![CDATA[karnataka]]></category>
		<category><![CDATA[motor vehicle act]]></category>
		<category><![CDATA[penalty]]></category>
		<category><![CDATA[Tamilnadu]]></category>
		<category><![CDATA[traffic]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=591</guid>
		<description><![CDATA[This blog post is about the analysis of implementation of helmet rule in various Indian states and the effect it had on bringing down accidental deaths due to 2 wheelers. Here we are specifically focussing on one particular state, Karnataka.                National Mandatory helmet legislation is included in the Indian Motor Vehicles Act, 1988.  However, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=591&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This blog post is about the analysis of implementation of helmet rule in various Indian states and the effect it had on bringing down accidental deaths due to 2 wheelers. Here we are specifically focussing on one particular state, Karnataka.<br />
               National Mandatory helmet legislation is included in the Indian Motor Vehicles Act, 1988.  However, implementing this law has been left to the individual states. Karnataka gov. enforced mandatory helmet rule for all 2 wheeler riders in the year 2007. Traffic cops started imposing fines to violators of the rule and within no time, good compliance to the rule was observed. <a href="http://ncrb.gov.in/adsi/main.htm" target="_blank"><em>Accidental Deaths &amp; Suicides in India</em> </a>publicizes the data about accidental deaths broken down by type of vehicle and by state (from 1967 to 2009). Considering the state of Karnataka, looking at the accidental deaths due to 2 wheelers and plotting the trend we see a clear decline in accidental deaths after 2007, specifically 8% decline in accidental deaths as of 2009. But can we just attribute this drop in accidental deaths to the helmet rule? Let’s explore.<br />
               Let’s identify another state (to control for helmet rule) which had similar accidental deaths pattern over the years before 2007 and was not a strict enforcer of the helmet rule. If we see a dip in accidental deaths in this state, we can pretty surely conclude helmet rule did not have any influence in bringing down the accidental deaths. Interestingly by analyzing accidental death patterns across different states, neighbouring state Tamil Nadu (TN) had similar accidental death patterns over the years before 2007 and was not strict enforcer of helmet rule (violators were not penalized often).<br />
              The figure below shows accidental deaths plotted for both these states starting from 2001 to 2009. Until 2007, the accidental deaths have been observed to increase year on year in both these states. However post 2007; Tamil Nadu continued to see increase in death rates due to 2 wheeler accidents for the next two years up to 2009 as Karnataka showed a decline in accidental deaths. Triangulating the above points, strict enforcement of Helmet rule in Karnataka helped bring down the accidental deaths due to 2 wheelers.</p>
<div id="attachment_592" class="wp-caption aligncenter" style="width: 640px"><a href="http://prdeepakbabu.files.wordpress.com/2011/10/acc.jpg"><img class="size-full wp-image-592" title="Helmet Rule - Effectiveness Analysis" src="http://prdeepakbabu.files.wordpress.com/2011/10/acc.jpg?w=630&#038;h=385" alt="Helmet Rule - Effectiveness Analysis" width="630" height="385" /></a><p class="wp-caption-text">Helmet Rule - Effectiveness Analysis</p></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/591/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/591/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/591/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=591&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2011/10/13/did-use-of-helmets-reduce-deaths-due-to-2-wheeler-accidents/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>

		<media:content url="http://prdeepakbabu.files.wordpress.com/2011/10/acc.jpg" medium="image">
			<media:title type="html">Helmet Rule - Effectiveness Analysis</media:title>
		</media:content>
	</item>
		<item>
		<title>Socializing Insights with end users: Analytics for masses &#8211; Amazon vs. LinkedIn</title>
		<link>http://prdeepakbabu.wordpress.com/2011/07/31/socializing-insights-with-end-users-analytics-for-masses-amazon-vs-linkedin/</link>
		<comments>http://prdeepakbabu.wordpress.com/2011/07/31/socializing-insights-with-end-users-analytics-for-masses-amazon-vs-linkedin/#comments</comments>
		<pubDate>Sun, 31 Jul 2011 16:38:14 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Business Insights]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[amazon.com]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[andreas weigend]]></category>
		<category><![CDATA[competing with analytics]]></category>
		<category><![CDATA[glassdoor]]></category>
		<category><![CDATA[insights]]></category>
		<category><![CDATA[linkedin]]></category>
		<category><![CDATA[linkedin.com]]></category>
		<category><![CDATA[mu-sigma]]></category>
		<category><![CDATA[musigma]]></category>
		<category><![CDATA[netflix]]></category>
		<category><![CDATA[recommendation]]></category>
		<category><![CDATA[reid hoffman]]></category>
		<category><![CDATA[reputation system]]></category>
		<category><![CDATA[Thomas Davenport]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=422</guid>
		<description><![CDATA[This blog post is about comparison of amazon.com and linkedin.com in terms of similarities across dimensions of analytic maturity &#38; use of data shared by their customers. As Thomas Davenport mentions in his book “Competing with analytics”, amazon.com is one of the few companies which was built on the foundation of data, the so called [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=422&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This blog post is about comparison of amazon.com and linkedin.com in terms of similarities across dimensions of analytic maturity &amp; use of data shared by their customers. As Thomas Davenport mentions in his book “Competing with analytics”, amazon.com is one of the few companies which was built on the foundation of data, the so called “Analytically mature” company. LinkedIn has joined the list, with lot of new features available to their users.</p>
<p>              As customers interact with the site, they generate data about their liking towards certain products or feature. Companies like amazon.com and LinkedIn clearly understand how to leverage this information to make the interaction between the customer and the site even more valuable &amp; relevant. Users who are ready to share more data with site about their likes/dislikes, the better would be the site’s recommendation for the user.  The companies need to instil this confidence in the customers mind, and hence have the users share data by will.</p>
<p>               Amazon.com &amp; LinkedIn makes available every little fact about the consumer’s behaviour and interaction with other users or products to help change their behaviour in terms of decision they make to buy or not-buy a product or whether to look for an employer change, etc.</p>
<p>LinkedIn has amazing insights about the companies, profiles which is all available to the users freely. In a <a href="http://www.youtube.com/watch?v=wPhmKasTiAg">interview with linkedIn CEO, Reid Hoffman</a> by Andreas weigend, Reid talks about every individual as a small business and every individual thinks of their reputation in terms of number of new connections, who viewed their profiles, how many times their profile came up in the search results and stats of similar kind.  <a href="http://www.weigend.com/">Andreas Weigend</a>, a social data expert talks behavior change brought about by features like &#8216;who viewed your profile in the last 15 days&#8217; in end users and in the way companies like LinkedIn treats the users.</p>
<p>(i)                  Insights about companies( lets say we are researching the company mu-sigma):</p>
<ul>
<li>Employee switching patterns between companies. Employees moved from ‘xyz’ to mu-sigma.</li>
<li>Employee switching patterns between companies: Employees  moved from mu-sigma to “abc”.</li>
<li>Gender distribution: M to F ratio at mu-sigma.</li>
<li>By years of experience, how does mu-sigma differ from other companies. Similar statistics is available by job function, educational qualification &amp; university. Similar company benchmark is available for comparison.</li>
<li>People who looked at “mu-sigma” also viewed – other list of companies?</li>
<li>Where employees of mu-sigma call home</li>
<li>Most recommended at mu-sigma.</li>
<li>Time trend of employees who got a change in title.</li>
</ul>
<p>(ii)                Insights about profiles/users</p>
<ul>
<ul>
<li>Who viewed my profile in the last 15 days?</li>
<li>How many times did your profile show up in search results?</li>
<li>Recommendation about other profiles/users you  might know.</li>
<li>Companies which user might be interested in following.</li>
<li>Relevant jobs for every user with functionality to apply for it.</li>
<li>Work recommendations by colleagues and customers.</li>
</ul>
</ul>
<div id="attachment_427" class="wp-caption alignright" style="width: 310px"><a href="http://prdeepakbabu.files.wordpress.com/2011/07/linkedin.jpg"><img class="size-medium wp-image-427" title="linkedin" src="http://prdeepakbabu.files.wordpress.com/2011/07/linkedin.jpg?w=300&#038;h=283" alt="LinkedIn " width="300" height="283" /></a><p class="wp-caption-text">LinkedIn</p></div>
<ul>Here’s a look at what amazon.com offers. When purchasing a product at amazon.com, the user would be presented with stats related to</p>
<li>How many users who searched for the book “The outliers by Malcolm Gladwell” (say) ended up purchasing it or ended up purchasing “The tipping Point” , “The Blink” , “What the dog saw”, etc.. in the same order. However I feel the need to quantify the same would help. I mean calling out that 80% of people who searched for “A” ended up purchasing “B”. Or 80% of people who searched for “A” ended up purchasing “A”.</li>
<li>“Frequently brought together items” for a given product.</li>
<li>Review statistics: How many rated 5-star, 4-star and so on, as a bar chart.</li>
</ul>
<p>We are moving towards an era of socializing data with end users to make every little decision they possibly make is data driven. WordPress, Netflix, glassdoor, etc are some of the other companies geared towards this trend. The intention of collecting data has truly gone beyond marketing purpose.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/422/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/422/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/422/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/422/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/422/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/422/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/422/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/422/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/422/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/422/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/422/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/422/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/422/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/422/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=422&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2011/07/31/socializing-insights-with-end-users-analytics-for-masses-amazon-vs-linkedin/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>

		<media:content url="http://prdeepakbabu.files.wordpress.com/2011/07/linkedin.jpg?w=300" medium="image">
			<media:title type="html">linkedin</media:title>
		</media:content>
	</item>
		<item>
		<title>&#8220;Integrate, Analyze, Visualize &amp; Socialize&#8221; &#8211; Visualization Tools &amp; Techniques</title>
		<link>http://prdeepakbabu.wordpress.com/2011/07/03/integrate-analyze-visualize-socialize-visualization-tools-techniques/</link>
		<comments>http://prdeepakbabu.wordpress.com/2011/07/03/integrate-analyze-visualize-socialize-visualization-tools-techniques/#comments</comments>
		<pubDate>Sat, 02 Jul 2011 20:26:17 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[BO XI]]></category>
		<category><![CDATA[cognos]]></category>
		<category><![CDATA[fusion charts]]></category>
		<category><![CDATA[fusion maps]]></category>
		<category><![CDATA[fusion widgets]]></category>
		<category><![CDATA[google charts]]></category>
		<category><![CDATA[patterns]]></category>
		<category><![CDATA[power charts]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[tableau]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=381</guid>
		<description><![CDATA[Turning raw data into insights often involves integrating data from multiple disparate sources (not just limited structured one), analyzing the data, visualizing it and socializing the results/insights to a broader audience to whom the results are of interest. In this cycle of turning data into insights, Visualization plays a vital role and hence would be [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=381&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Turning raw data into insights often involves integrating data from multiple disparate sources (not just limited structured one), analyzing the data, visualizing it and socializing the results/insights to a broader audience to whom the results are of interest. In this cycle of turning data into insights, Visualization plays a vital role and hence would be the topic of my discussion in this blog post . Visualization could aid in analyzing huge data by identifying patterns which are easily interpretable visually as compared to tabular layout of numbers.Second, Visualization could help  represent the numbers using visuals which are easy for everyone to read and understand. One could easily convey the insights of the analysis by visuals, grasped in a minute or two, which might have possibly took 3-4 mins using textual aid/table of numbers.This is a important factor to consider especially when are you delivering the findings to the CEO/CFO/CXO/CIO of a company, as often they have limited time.</p>
<div id="attachment_390" class="wp-caption alignright" style="width: 310px"><a href="http://prdeepakbabu.files.wordpress.com/2011/07/snow.gif"><img class="size-medium wp-image-390" title="London Cholera Outbreak visualized" src="http://prdeepakbabu.files.wordpress.com/2011/07/snow.gif?w=300&#038;h=268" alt="London Cholera Outbreak visualized" width="300" height="268" /></a><p class="wp-caption-text">London Cholera Outbreak visualized</p></div>
<p>Going back to history of visualization. The most famous, early example mapping epidemiological data was Dr. John Snow&#8217;s map of deaths from a cholera outbreak in London, 1854, in relation to the locations of public water pumps. The original (<a href="http://www.ph.ucla.edu/epi/snow/highressnowmap.html">high-res PDF copies from UCLA</a>), spawned many imitators including this simplified version by Gilbert in 1958. Tufte (1983, p. 24) says,&#8221;Snow observed that cholera occurred almost entirely among those who lived near (and drank from) the Broad Street water pump. He had the handle of the contaminated pump removed, ending the neighborhood epidemic which had taken more than 500 lives.&#8221;<br />
The following pointers should help anyone analyze data and socialize finding by effective newer visualizations techniques:</p>
<p>1. <a href="http://www.fusioncharts.com/gallery/">Fusion Charts</a> &#8211; involves basic chart types, all it needs is a data file, configuration file and can link the chart to the data file, flash based  &amp; supports interactive charts, web supported.<br />
2. <a href="http://www.fusioncharts.com/maps/">Fusion Maps</a> &#8211; contains maps of all counties and major cities world wide, interactive, flash based, involves data file and configuration file, web supported.<br />
3. <a href="http://www.fusioncharts.com/widgets">Fusion Widgets</a> &#8211; involves coolest visualization techniques like angular gauge, spark line/column,gant chart, pyramid, cylindrical &amp; thermometric gauge &amp; bulb gauge. Some of these charts have power to do real time streaming generally used in stock market analysis.<br />
4. <a href="http://www.fusioncharts.com/powercharts/">Power Charts</a> &#8211; contains some of the rare chart types like node chart, heat map, waterfall chart, multilevel pie chart, candlestick chart,etc. again flash based and hence web supported.<br />
4. <a href="http://www.r-project.org/">R &#8211; Revolution Computing</a> &#8211; a powerful open source data mining/stat language which can generate stacked multi-combinatorial charts using a single line of command.<br />
5. <a href="http://code.google.com/apis/chart/interactive/docs/gadgetgallery.html">Google Visualization</a> &#8211; javascript based, web supported, involves some of the coolest viz techniques like motion chart which can display data in 5 dimensions, geomap, word cloud, money pile, 3D chart, QR code, etc.<br />
6. <a href="http://code.google.com/apis/chart/">Google Charts</a> &#8211; contains all basic chart types, from google.<br />
7. <a href="http://www.adobe.com/products/flex/">Custom Flex Charts</a> &#8211; Using customer written flex code and action script code.<br />
8. <a href="http://office.microsoft.com/en-us/excel/">Microsoft Excel</a> &#8211; famous for its quick and ease of chart creation , latest version now has spark line chart support.<br />
9. <a href="http://www.tableausoftware.com/">Tableau &#8211; Data Exploration</a>- would recommend this tool for rapid fire analytics involving various dimension, it is just as easy as drag and drop to change views of the metrics by dimension hierarchy.<br />
10. <a href="http://en.wikipedia.org/wiki/Business_intelligence_tools">BI Report Tools &#8211; BOXI, Cognos</a> &#8211; commercial BI tools with support for creation of various report type based on charts and tabular layouts.</p>
<p>Industry Trends involve real time streaming of charts &#8211; used in supply chain analytics, interactive charts, mobile supported charts, Creating alerts in charts(for example alert biz. users sending an email, as the sales of any product goes below $x on three consecutive days and so on..), video &amp; audio supported charts.</p>
<a href="http://polldaddy.com/poll/5200887/">View This Poll</a>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/381/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/381/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/381/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=381&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2011/07/03/integrate-analyze-visualize-socialize-visualization-tools-techniques/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>

		<media:content url="http://prdeepakbabu.files.wordpress.com/2011/07/snow.gif?w=300" medium="image">
			<media:title type="html">London Cholera Outbreak visualized</media:title>
		</media:content>
	</item>
		<item>
		<title>Some of the insights have stood best, because they were simple!</title>
		<link>http://prdeepakbabu.wordpress.com/2011/02/20/some-of-the-insights-have-stood-best-because-they-were-simple/</link>
		<comments>http://prdeepakbabu.wordpress.com/2011/02/20/some-of-the-insights-have-stood-best-because-they-were-simple/#comments</comments>
		<pubDate>Sat, 19 Feb 2011 18:58:33 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Business Insights]]></category>
		<category><![CDATA[Customers shopped online returned via stores]]></category>
		<category><![CDATA[Data driven marketing]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[if you live more than two miles from a pharmacy store]]></category>
		<category><![CDATA[In the United States]]></category>
		<category><![CDATA[insights]]></category>
		<category><![CDATA[Integrating data]]></category>
		<category><![CDATA[Mark Jeffery]]></category>
		<category><![CDATA[MIT]]></category>
		<category><![CDATA[Pentland]]></category>
		<category><![CDATA[Randy Lea]]></category>
		<category><![CDATA[Teradata]]></category>
		<category><![CDATA[walgreens]]></category>
		<category><![CDATA[We won because we understood the science of incentivizing people to cooperate]]></category>
		<category><![CDATA[you probably don’t shop there!]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=195</guid>
		<description><![CDATA[ In this blog post, i talk about 3 scenarios where there had been highly valuable insights derived, yet remaining simple. 1. Customers shopped online returned via stores Randy Lea, VP product &#38; service marketing Teradata talks about one of their clients, who had tagged their e-com customers as best customers based on web sales they were generating [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=195&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p> In this blog post, i talk about 3 scenarios where there had been highly valuable insights derived, yet remaining simple.</p>
<p><em><strong>1. Customers shopped online returned via stores</strong></em> Randy Lea, VP product &amp; service marketing Teradata talks about one of their clients, who had tagged their e-com customers as best customers based on web sales they were generating and reaching out to them with various promotions. However, on integrating their web data with Enterprise data( store data) they found most of the customers were buying things online in multiple units and returning them through stores.</p>
<p>        For example, some customers brought 4-5 shirts of different colors, however they reatined one of them they liked the most and returned the rest of them visiting the stores. Effectively customers were buying through one channel(web) and returning them through another channel(store).Hence the web customers, whom they believed best not actually best rather average shoppers and shouldnt have been sent offers.</p>
<p>Source:<strong> <a href="http://www.youtube.com/watch?v=ejVYX3uPbe8" target="_blank">Teradata</a></strong> ( Video) </p>
<p><em><strong>2. In the United States, if you live more than two miles from a pharmacy store, you probably don’t shop there!</strong></em>In the book data-drien marketing , Mark Jeffery talks about the case of how walgreens optimized their marketing spend using simple geo-spatial visualization. The pic on the right, is a picture of three stores of the Walgreens pharmacy chain on a map.Walgreens is a $59 billion annual revenue pharmacy company with 6,850 stores throughout the United States.</p>
<div id="attachment_200" class="wp-caption aligncenter" style="width: 528px"><a href="http://prdeepakbabu.files.wordpress.com/2011/02/wagstr.jpg"><img class="size-full wp-image-200 " title="Source: &quot;Data Driven Marketing&quot; by Mark jeffery" src="http://prdeepakbabu.files.wordpress.com/2011/02/wagstr.jpg?w=518&#038;h=551" alt="Source: &quot;Data Driven Marketing&quot; by Mark jeffery" width="518" height="551" /></a><p class="wp-caption-text">Geo spatial visualization of Walgreens stores</p></div>
<p>This geospatial picture shows dots that are the customers and where they live and are coded by shape depending on which of the threeWalgreens stores they shop. The ‘‘diamond’’ customers shop at Store 1; the ‘‘square’’ customers, at Store 2; and the ‘‘star’’ customers, at Store 3. This pharmacy retail chain predominantly markets using ﬂyers in newspapers. The way they pay for the marketing is by zip code, denoted by the dashed line, for example, in the picture. Mike Feldner, the marketing manager who ﬁrst created these pictures, noticed something interesting: the circle on the picture is two miles in radius, and after looking at many pictures throughout the United States, he noticed that there are no dots (customers) for a store more than two miles from the store. He concluded that in the United States, if you live more than two miles from a pharmacy store, you probably don’t shop there. At that time,Walgreens treated each U.S. locale equally; allocating equal dollar amounts for newspaper advertising in each zip code across the United States. But the data show that if there is no store within two miles of the zip code, customers do not shop at the store. Based on these data, Walgreens ultimately stopped spending advertising dollars in all zip codes without a store within two miles of the zip code. As you might guess, the impact to sales revenues was exactly zero. The impact to marketing, however, was a cost saving of more than $5 million, for a total cost of collecting the data and creating the plots of approximately $200,000. This multimillion-dollar saving in marketing did not require a lot of money, and the analysis was done on a personal computer (PC). This is yet another example of being simple in approach, yet making the impact.</p>
<p>Source: <strong><a href="http://www.amazon.com/Data-Driven-Marketing-Metrics-Everyone-Should/dp/0470504544/ref=sr_1_1?ie=UTF8&amp;qid=1298138816&amp;sr=8-1" target="_blank">&#8220;Data-Driven Marketing&#8221; by Mark Jeffery</a></strong></p>
<p><em><strong>3. We won because we understood the science of incentivizing people to cooperate</strong></em>Late last year the Pentagon&#8217;s mad-scientist research wing, Darpa, announced the Network Challenge, a $40,000 prize for the first group to find and report the locations of ten red weather balloons that the agency would set aloft one day in secret locations around the country. Most of the thousands of groups that signed up quickly realized that crowdsourcing was the way to find the 8-foot spheres. So, naturally, they offered bounties to balloon hunters. But Pentland&#8217;s crew at MIT&#8217;s Human Dynamics Lab&#8211;part of the MIT Media Lab&#8211;took their crowd control a step further. &#8220;It was trivial for us to slap together the balloon thing,&#8221; says the 58-year-old Pentland. That&#8217;s because other groups&#8217; tactics were based on guesswork, he argues. His were based on lessons learned through data-mining research. &#8220;We won because we understood the science of incentivizing people to cooperate.&#8221;</p>
<p>Read the entire article here: <strong><a href="http://www.yourversion.com/index.php?p=viewpage&amp;url_id=6588780" target="_blank">Mining Human Behavior at MIT</a></strong></p>
<p><em> [contact-form]</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/195/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/195/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/195/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=195&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2011/02/20/some-of-the-insights-have-stood-best-because-they-were-simple/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>

		<media:content url="http://prdeepakbabu.files.wordpress.com/2011/02/wagstr.jpg" medium="image">
			<media:title type="html">Source: &#34;Data Driven Marketing&#34; by Mark jeffery</media:title>
		</media:content>
	</item>
		<item>
		<title>Market Basket Analysis/Association Rule Mining using R package &#8211; arules</title>
		<link>http://prdeepakbabu.wordpress.com/2010/11/13/market-basket-analysisassociation-rule-mining-using-r-package-arules/</link>
		<comments>http://prdeepakbabu.wordpress.com/2010/11/13/market-basket-analysisassociation-rule-mining-using-r-package-arules/#comments</comments>
		<pubDate>Sat, 13 Nov 2010 08:00:16 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Apriori]]></category>
		<category><![CDATA[Association Rule Mining]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[apriori]]></category>
		<category><![CDATA[arules]]></category>
		<category><![CDATA[confidence]]></category>
		<category><![CDATA[market basket analysis]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[read.transactions]]></category>
		<category><![CDATA[recomendation engine]]></category>
		<category><![CDATA[rule]]></category>
		<category><![CDATA[support]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=154</guid>
		<description><![CDATA[In my previous post, i had discussed about Association rule mining in some detail.  Here i have shown the implementation of the concept using open source tool R using the package arules. Market Basket Analysis is a specific application of Association rule mining, where retail transaction baskets are analysed to find the products which are [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=154&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In my <a href="http://prdeepakbabu.wordpress.com/2010/02/24/association-rule-mining/" target="_blank">previous post</a>, i had discussed about Association rule mining in some detail.  Here i have shown the implementation of the concept using open source tool <a href="http://www.r-project.org/" target="_blank">R</a> using the package arules. Market Basket Analysis is a specific application of Association rule mining, where retail transaction baskets are analysed to find the products which are likely to be purchased together. The analysis output forms the input for  recomendation engines/marketing strategies. <span id="more-154"></span>Association rule mining cannot be done using Base SAS/ Enterprise Guide and hence R seems to be the best option in my opinion.<br />
The arules package has Apriori algorithm which i will be demonstrating here using a sample transaction file called &#8220;Transactions_sample.csv&#8221;( find below)</p>
<p><span style="text-decoration:underline;"><strong>R Source Code:</strong><span style="color:green;"> </span></span></p>
<p>#To set the working directory to folder where source files are placed.(set this to directory as per your needs)<br />
<span style="color:brown;">setwd(&#8220;C:/Documents and Settings/deepak.babu/Desktop/output&#8221;);</span></p>
<p><span style="color:green;">#Install the R package arules</span><br />
<span style="color:brown;">install.packages(&#8220;arules&#8221;);</span></p>
<p><span style="color:green;">#load the arules package</span><br />
<span style="color:brown;">library(&#8220;arules&#8221;);</span></p>
<p><span style="color:green;"># read the transaction file as a Transaction class<br />
# file &#8211; csv/txt<br />
# format &#8211; single/basket (For ‘basket’ format, each line in the transaction data file represents a transaction<br />
#           where the items (item labels) are separated by the characters specified by sep. For ‘single’ format,<br />
#           each line corresponds to a single item, containing at least ids for the transaction and the item. )<br />
# rm.duplicates &#8211; TRUE/FALSE<br />
# cols -   For the ‘single’ format, cols is a numeric vector of length two giving the numbers of the columns (fields)<br />
#           with the transaction and item ids, respectively. For the ‘basket’ format, cols can be a numeric scalar<br />
#           giving the number of the column (field) with the transaction ids. If cols = NULL<br />
# sep &#8211; &#8220;,&#8221; for csv, &#8220;\t&#8221; for tab delimited</span><br />
<span style="color:brown;">txn = read.transactions(file=&#8221;Transactions_sample.csv&#8221;, rm.duplicates= FALSE, format=&#8221;single&#8221;,sep=&#8221;,&#8221;,cols =c(1,2));</span></p>
<p><span style="color:green;"># Run the apriori algorithm</span><br />
<span style="color:brown;">basket_rules &lt;- apriori(txn,parameter = list(sup = 0.5, conf = 0.9,target=&#8221;rules&#8221;));</span></p>
<p><span style="color:green;"># Check the generated rules using inspect</span><br />
<span style="color:brown;">inspect(basket_rules);</span></p>
<p><span style="color:green;">#If huge number of rules are generated specific rules can read using index</span><br />
<span style="color:brown;">inspect(basket_rules[1]);</span></p>
<p><span style="color:blue;"> </span></p>
<p><span style="color:green;">#############################################################################<br />
##############  SUPPLEMENTARY  INFO  ########################################<br />
#############################################################################<br />
#To visualize the item frequency in txn file</span><br />
<span style="color:brown;">itemFrequencyPlot(txn);</span></p>
<p><span style="color:green;">#To see how the transaction file is read into txn variable.</span><br />
<span style="color:brown;">inspect(txn);</span></p>
<p><strong><span style="text-decoration:underline;"><span style="color:black;">Output:<br />
</span></span></strong><br />
<span style="color:blue;"><br />
parameter specification:<br />
confidence minval smax arem  aval originalSupport support minlen maxlen target<br />
0.9    0.1    1 none FALSE            TRUE     0.5      1      5  rules<br />
ext<br />
FALSE<br />
algorithmic control:<br />
filter tree heap memopt load sort verbose<br />
0.1 TRUE TRUE  FALSE TRUE    2    TRUE</span><span style="color:blue;"> </span></p>
<p><span style="color:blue;">apriori &#8211; find association rules with the apriori algorithm<br />
version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt<br />
set item appearances &#8230;[0 item(s)] done [0.00s].<br />
set transactions &#8230;[6 item(s), 7 transaction(s)] done [0.00s].<br />
sorting and recoding items &#8230; [2 item(s)] done [0.00s].<br />
creating transaction tree &#8230; done [0.00s].<br />
checking subsets of size 1 2 done [0.00s].<br />
writing &#8230; [1 rule(s)] done [0.00s].<br />
creating S4 object  &#8230; done [0.00s].</span></p>
<p><span style="color:black;"><br />
As we see from the output, Number of rules generated are 1, with support = 50% and confidence = 90%. The generated rules can be checked using inspect(basket_rules) command:<br />
<span style="color:blue;">lhs                              rhs            support               confidence     lift<br />
1 {Choclates} =&gt; {Pencil}  0.5714286          1                        1.166667</span></span></p>
<p><span style="color:black;"><span style="color:black;">The above rule means &#8220;If a chocolate is brought then there is 90% likelihood of purchase of pencil&#8221;. The support 0.57 indicates that 57% of the transaction in the data involve chocolate purchases.  The confidence of 90% indicates out of the transactions which involve chocolates, 90% of them also involved purchase of pencils. Hence the support indicates goodness of the choice of rule and confidence indicates the correctness of the rule.</span></span></p>
<p>Also we can see the distribution of items within transactions using image(txn) and  itemFrequencyPlot(txn).</p>
<p>Transaction.csv<br />
===========<br />
1001,Choclates<br />
1001,Pencil<br />
1001,Marker<br />
1002,Pencil<br />
1002,Choclates<br />
1003,Pencil<br />
1003,Coke<br />
1003,Eraser<br />
1004,Pencil<br />
1004,Choclates<br />
1004,Cookies<br />
1005,Marker<br />
1006,Pencil<br />
1006,Marker<br />
1007,Pencil<br />
1007,Choclates</p>
<div id="attachment_171" class="wp-caption alignleft" style="width: 310px"><a href="http://prdeepakbabu.files.wordpress.com/2010/11/freqplot.jpeg"><img class="size-medium wp-image-171" title="Item Frequency Plot" src="http://prdeepakbabu.files.wordpress.com/2010/11/freqplot.jpeg?w=300&#038;h=300" alt="" width="300" height="300" /></a><p class="wp-caption-text">Item Frequency Plot</p></div>
<div id="attachment_177" class="wp-caption alignleft" style="width: 310px"><a href="http://prdeepakbabu.files.wordpress.com/2010/11/imageplot.jpeg"><img class="size-medium wp-image-177" title="imageplot" src="http://prdeepakbabu.files.wordpress.com/2010/11/imageplot.jpeg?w=300&#038;h=300" alt="Image(txn) showing density " width="300" height="300" /></a><p class="wp-caption-text">Image(txn) showing density </p></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/154/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=154&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2010/11/13/market-basket-analysisassociation-rule-mining-using-r-package-arules/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>

		<media:content url="http://prdeepakbabu.files.wordpress.com/2010/11/freqplot.jpeg?w=300" medium="image">
			<media:title type="html">Item Frequency Plot</media:title>
		</media:content>

		<media:content url="http://prdeepakbabu.files.wordpress.com/2010/11/imageplot.jpeg?w=300" medium="image">
			<media:title type="html">imageplot</media:title>
		</media:content>
	</item>
		<item>
		<title>Beyond BI &amp; Analytics</title>
		<link>http://prdeepakbabu.wordpress.com/2010/09/12/beyond-bi-analytics/</link>
		<comments>http://prdeepakbabu.wordpress.com/2010/09/12/beyond-bi-analytics/#comments</comments>
		<pubDate>Sat, 11 Sep 2010 19:05:35 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Bio-imitation]]></category>
		<category><![CDATA[Bio-inspiration]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[sensor]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=147</guid>
		<description><![CDATA[For the last 6 months, i have been closely following trends in information management. Below are few of my observations. Data source explosion: Business Problems are gaining complexity day by day, hence there is a huge demand for analyzing data from multitude of sources to help companies frame strategies for growth.  GPS data accumulated by Telecom [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=147&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For the last 6 months, i have been closely following trends in information management. Below are few of my observations.</p>
<ul>
<li>Data source explosion: Business Problems are gaining complexity day by day, hence there is a huge demand for analyzing data from multitude of sources to help companies frame strategies for growth.  GPS data accumulated by Telecom companies offer insights into customers current location and provide context aware recomendations. Infact, some of the telecom companies have introduced location based pricing. Sensor data helps identify security threats to secure networks. Social network data has opened up as a channel for marketing services/product. Analysis of such closely knit data leads to behavioral &amp; Contextual targeting. Traditional data analysis tools/algorithms fail to perform efficiently because such data are of huge sizes and needs newer datastructures for efficient analysis.</li>
<li>Databases going beyond relational is gaining popularity. NoSQL dbs and Graph/Tree/XML based databases.</li>
<li>Open Source tools continue to emerge.(R, RapidMiner, Weka)</li>
<li>Growing need for massive dataset analysis.</li>
<li>Artificial Intelligence(AI) and NLP gaining popularity among data analysts( in additional to ML techniques)</li>
<li>Multimedia Analytics: Need for gathering critical metrics like customer footfalls, quantifying customers satisfaction by using facial expressions. All these applications demand high end signal processing( both Image &amp; Video). There is a lot of scope for innovation in this area.</li>
<li>Privacy preserving techniques for data analysis. This in turn encourages companies to outsource some of the critical data analysis to third parties.</li>
<li>Agile Methodologies for Analytics Project to cope up with rapidly changing customer/business needs.</li>
<li>Bio-Inspiration/Bio-Imitation: To learn from nature/natural processes and develop analogous techniques which could potentially solve a real-world problem. Some classic examples are development of Neural network inspired by working of a human brain, solving path optimization problem from Ant colonies, 280 degree view of honey bee(vision) etc.</li>
<li>More and more data are made publicly available.</li>
<li>Real Time data integration, insight generation and business decision.</li>
<li>Complex visualization techniques through new technology like Adobe Flex , MS Silverlight,etc which are known for generating RIA.(Rich Internet Applications)</li>
</ul>
<p>And I am sure these are just few items in the list and really not exhaustive. Feel free to share your comments.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/147/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/147/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/147/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/147/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/147/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/147/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/147/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/147/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/147/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/147/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/147/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/147/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/147/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/147/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=147&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2010/09/12/beyond-bi-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>
	</item>
		<item>
		<title>Datamining Video Lectures &#8211; Best way to learn</title>
		<link>http://prdeepakbabu.wordpress.com/2010/03/03/datamining-video-lectures-best-way-to-learn/</link>
		<comments>http://prdeepakbabu.wordpress.com/2010/03/03/datamining-video-lectures-best-way-to-learn/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 15:39:38 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Lecture Videos]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[andrew ng]]></category>
		<category><![CDATA[autonomous car]]></category>
		<category><![CDATA[autonomous driving]]></category>
		<category><![CDATA[david mease]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[stanford]]></category>
		<category><![CDATA[stat202]]></category>
		<category><![CDATA[statistical aspects of datamining]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=121</guid>
		<description><![CDATA[  Do you find analytics/data mining a difficult topic to understand and learn? To a certain extent true if you were to use books as the source. Friends, i found these two very valuable and high quality source for learning topics related to data mining and above all these are free.   (i) From David [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=121&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>  Do you find analytics/data mining a difficult topic to understand and learn? To a certain extent true if you were to use books as the source. Friends, i found these two very valuable and high quality source for learning topics related to data mining and above all these are free.</p>
<p>  <strong>(i) From David Mease who teaches DM at Google</strong>:You can access approximately 11 hours of video(11 parts) on the semester topic &#8220;Statistical Aspects of Data Mining&#8221; here <a href="http://video.google.com/videosearch?q=mease+stats+202&amp;sitesearch">http://video.google.com/videosearch?q=mease+stats+202&amp;sitesearch</a>=# and also you can get pdf version of lecture slides and assignments, try to solve them and master them. I guess the author has also some blogs to discuss problems in this topic. The best thing about this video tutorial is that David has demonstrated implementation of each of these techniques using open source data mining tool &#8211; R (short for Revolution).</p>
<p>Videos: <a href="http://video.google.com/videosearch?q=mease+stats+202&amp;sitesearch">http://video.google.com/videosearch?q=mease+stats+202&amp;sitesearch</a>=#<br />
Lecture notes -pdf : <a href="http://www.stats202.com/original_index.html">http://www.stats202.com/original_index.html</a><br />
Course Home: <a href="http://www.stats202.com/">http://www.stats202.com/</a></p>
<p><strong>  (ii)From Stanford University as Andrew Ng. teaches &#8220;Machine learning&#8221;</strong>: This is another very usefull video course. The semester course is covered in 20 parts and hence approx. 20 hours of quality knowledge. The best thing about Andrew is he teaches the mathematics so good, you start visualizing equations and that is one good way to learn maths. Its not just about maths, he also demonstrates the video demos on Machine learning projects implemented by his students like autonomous car driving, autonomous flying, converting a picture to a 3-d experience,etc&#8230;that way you dont get bored anytime during the lecture.I loved it a lot.Hope you enjoy it too.</p>
<p>videos: <a href="http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599&amp;search_query=stanford+%2B+machine+learning">http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599&amp;search_query=stanford+%2B+machine+learning</a><br />
Lecture Notes:  <a href="http://www.stanford.edu/class/cs229/materials.html">http://www.stanford.edu/class/cs229/materials.html</a><br />
Course Home: <a href="http://cs229.stanford.edu/">http://cs229.stanford.edu/</a></p>
<p>   I am sure you will find more content than what i have mentioned here. Feel free to explore the course page. I personally believe anything can be learnt best only by first learning its applications,which in process gets you motivated and the rest is assured. I would like to thank Andrew Ng. and David Mease for sharing their expertise. A good initiative by stanford. Expecting more from top educational schools.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/121/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/121/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/121/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/121/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/121/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/121/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/121/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=121&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2010/03/03/datamining-video-lectures-best-way-to-learn/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>
	</item>
		<item>
		<title>Association Rule Mining</title>
		<link>http://prdeepakbabu.wordpress.com/2010/02/24/association-rule-mining/</link>
		<comments>http://prdeepakbabu.wordpress.com/2010/02/24/association-rule-mining/#comments</comments>
		<pubDate>Wed, 24 Feb 2010 16:41:03 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Association Rule Mining]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[apriori]]></category>
		<category><![CDATA[association analysis]]></category>
		<category><![CDATA[categorical]]></category>
		<category><![CDATA[confidence]]></category>
		<category><![CDATA[frequent itemset]]></category>
		<category><![CDATA[graph mining]]></category>
		<category><![CDATA[itemset]]></category>
		<category><![CDATA[market basket analysis]]></category>
		<category><![CDATA[sequential]]></category>
		<category><![CDATA[simpson paradox]]></category>
		<category><![CDATA[support]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=101</guid>
		<description><![CDATA[Association Rule Mining [ Implementation using R here] Association Rule mining is one of the classical DM technique. Association Rule mining is a very powerful technique of analysing / finding patterns in the data set. It is a supervised learning technique in the sense that we feed the Association Algorithm with a training data set( [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=101&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><span style="text-decoration:underline;">Association Rule Mining </span><a href="http://prdeepakbabu.wordpress.com/2010/11/13/market-basket-analysisassociation-rule-mining-using-r-package-arules/" target="_blank">[ Implementation using R here]</a><span style="text-decoration:underline;"><br />
</span></p>
<p>Association Rule mining is one of the classical DM technique. Association Rule mining is a very powerful technique of analysing / finding patterns in the data set. It is a supervised learning technique in the sense that we feed the Association Algorithm with a training data set( as called Experience E in machine learning context) to formulate hypothesis(H) . The input data to a association rule mining algorithm requires a format which will be detailed shortly.<br />
Ok let me first introduce the readers with some of the application areas of this DM technique and motivation for the study of Association analysis. The classic application of the association rule mining is to analyse the Market Basket Data of a retail store. For example, Retail stores like Wal-Mart, Reliance fresh, big bazaar gather data about customer purchase behaviour and they have complete details of the goods purchased as part of a single bill. This is called Market basket data and its analysis is termed “market basket analysis”.<span id="more-101"></span> It has been found that customers who buy diapers are more likely to buy beer. This is a pattern discovered by association analysis. Other applications include but not limited to scientific data analysis (earth science to study ocean, land and atm. Processes) and in the field of bioinformatics (genome sequence mining, etc.) Also it is used in document analysis for determining the words that often occur together and weblog mining temporal data for any pattern in online behaviour and website navigation. There are numerous other examples of association analysis which is only bounded by human imagination and capability.<br />
Let’s start with Association mining with market basket data as the example. An itemset is the group of items. A k-itemset indicates the no. of items under study is K numbers. As part of a transaction (purchase by customer) one or more items from the itemset may be included. The occurrence/purchase of an item is indicated by a value 1 while non-inclusion is indicated by a value 0. Hence a typical market basket data like the one below:</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top"></td>
<td valign="top">Book</td>
<td valign="top">Pen</td>
<td valign="top">Pencil</td>
<td valign="top">Eraser</td>
<td valign="top">Sharpener</td>
<td valign="top">Crayons</td>
<td valign="top">Maps</td>
<td valign="top">A4 sheets</td>
</tr>
<tr>
<td valign="top">T1</td>
<td valign="top">1</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">1</td>
</tr>
<tr>
<td valign="top">T2</td>
<td valign="top">0</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">0</td>
</tr>
<tr>
<td valign="top">T3</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">0</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">1</td>
<td valign="top">0</td>
<td valign="top">0</td>
</tr>
</tbody>
</table>
<p>Hence it means<br />
T1 – {Book,Pen,Eraser,Crayons}<br />
T2 – {Pen,Eraser,Crayons}<br />
T3 – {Book,Eraer,Crayons}<br />
In the above example, we call it an 8-itemset.</p>
<p>If you see the above representation of market basket data, one may think there are few additional info which are missing like the quantity purchased, Amount involved in the transactions/purchase. Of course, the association analysis can be extended to involve such detail.<br />
The application of Association rule mining algorithm results in the discovery of rules/patterns of the following form:<br />
{Pencil} &#8211; &gt; {Eraser}<br />
{Book,Maps}- &gt; {Pen}<br />
What it simply says is “If a customer bought a pencil he is more likely to buy an eraser”. Mathematically, it says “Purchase of Pencil implies purchase of eraser”. Once a pattern is discovered, it can used/integrated into decision support system to form strategies based on the rule. In the above case, the company may use this rule to do cross-selling i.e place pencil and eraser as close to each other which increases the sales and hence profit. Just imagine, if a number of such strong rules are disovered in a jewellery shop it would result in tremendous value. Now let me answer what “strong” rule means?<br />
Now that we have defined what a rule is, we are posed with two important questions. Are all rules discovered by my algorithm is really useful / meaningful? How confident i am about the rule? To answer these questions, we use some mathematical measure to quantify the usefulness and confidence.<br />
Most common evaluation measures for a rule are support and confidence measures. There are other measures namely lift, interest factor, correlation. We will talk about it a bit later. A support measure answers the first question, the interestingness measure. It is represented in percentage. It defines how many of my transactions support this rule. If it is say 4/100 it means just 4 out of 100 transactions involve this rule, then probably this is uninteresting so we may choose to ignore it. Hence our Association rule mining algorithm sets some threshold/min value for the support to eliminate uninteresting rules and retain the interesting ones. An example of uninteresting rule could be {pen} &#8211; &gt; {eraser}, this could be an uninteresting rule as pen and eraser might be purchased as a matter of chance, i.e it has lower support.<br />
Now having answered the interestingness criteria, we are left with determining the confidence of the rule. A confidence measure quantifies the confidence as a ratio of no. of transaction holding this rule valid against the no. of transactions involving this rule. Higher the value, more reliable is the rule. A strong rule indicates a rule with higher confidence value.</p>
<p>Lets quickly jump into details of the algorithm. The Association rule mining is carried out using the famous Apriori Algorithm. We will also talk about the variations of this algorithm to apply it for continous data and hierarchial data. Before that, let’s formalize the definition of the association analysis problem:<br />
<em> “Given a set of transactions, the problem is to find all rules/patterns with support &gt;= minsup and confidence &gt;= minconf”</em></p>
<p>The Apriori Algorithm:<br />
A brute force approach is very expensive task. Hence the approach followed by apriori algorithm is to break up the requirement of computing support and confidence as a two separate tasks. In the first step, frequent itemsets are generated i.e those itemsets which holds the criteria of minimum support. In the second and final step, Rule generation is made possible by evaluation the confidence measure. Let’s visualize the approach diagrammatically as shown below:</p>
<div id="attachment_105" class="wp-caption alignleft" style="width: 310px"><a href="http://prdeepakbabu.files.wordpress.com/2010/02/apriori.jpg"><img class="size-medium wp-image-105" title="Apriori Algorithm" src="http://prdeepakbabu.files.wordpress.com/2010/02/apriori.jpg?w=300&#038;h=134" alt="" width="300" height="134" /></a><p class="wp-caption-text">Apriori Algorithm</p></div>
<p>Measures could be classified into two categories – subjective and objective. A subjective measure often involves some heuristics and involves domain expertise to eliminate un interesting rules while objective measure are domain independent measures. Support and confidence are good examples of objective measures. Objective measures could be either symmetric binary or asymmetric binary. The choice of measure depends on the type of application and it must be carefully chosen to get quality results.<br />
Simpson’s paradox states that there is a possibility of misinterpretation due to the hidden variable not as part of the analysis influencing the rules/patterns.</p>
<p>The apriori algorithm can be extended to solving various other problems by making little modifications to the data representation methods, Data structures and algorithm.</p>
<ol>
<li>To handle categorital and continous data. For example gender is categorical attribute and can be represented using two items namely gender=’M’ and gender = ‘F’.</li>
<li>To handle concept of hierarchy in itemsets. For example, if IPod, Smartphone are two specific itemsets, then we can define a hierarchy item called electronic goods as a parent item.</li>
<li>To handle sequential pattern mining. Example: weblog mining, genome sequence mining, customer purchase behaviour.</li>
<li>Graph and sub-graph mining: eample – weblogs to identify navigation patterns, chemical structure analysis.</li>
<li>To identify infrequent patterns, negatively correlated patterns, etc.</li>
</ol>
<p>You can also download a copy of the above material here: <a href="http://prdeepakbabu.files.wordpress.com/2010/02/association-rule-mining.pdf">Association Rule Mining</a>. Please feel free to comment on the topic. You can also subscribe to this blog by clicking on the subscribe button on the right side of the page.</p>
<a href="http://polldaddy.com/poll/2751817/">View This Poll</a>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/101/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/101/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/101/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=101&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2010/02/24/association-rule-mining/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>

		<media:content url="http://prdeepakbabu.files.wordpress.com/2010/02/apriori.jpg?w=300" medium="image">
			<media:title type="html">Apriori Algorithm</media:title>
		</media:content>
	</item>
		<item>
		<title>Future of Predictive analytics &#8211; Part II</title>
		<link>http://prdeepakbabu.wordpress.com/2010/01/17/future-of-predictive-analytics-part-ii/</link>
		<comments>http://prdeepakbabu.wordpress.com/2010/01/17/future-of-predictive-analytics-part-ii/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 17:48:33 +0000</pubDate>
		<dc:creator>prdeepakbabu</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[attrition modelling]]></category>
		<category><![CDATA[Avatar]]></category>
		<category><![CDATA[GPS data]]></category>
		<category><![CDATA[social datamining]]></category>
		<category><![CDATA[unstructured data]]></category>
		<category><![CDATA[uplift modelling]]></category>

		<guid isPermaLink="false">http://prdeepakbabu.wordpress.com/?p=95</guid>
		<description><![CDATA[Here is the continuation to the article i had posted few days back &#60;here&#62;. I am back with some interesting info on recent advancements in the area of analytics. Before going on to the details, wanna share something basic &#8211; “The data/datum”. I met a friend of mine, working for a leading information management firm [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=95&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here is the continuation to the article i had posted few days back &lt;<a href="http://prdeepakbabu.wordpress.com/2009/11/14/future-of-predictive-analytics/">here</a>&gt;. I am back with some interesting info on recent advancements in the area of analytics. Before going on to the details, wanna share something basic &#8211; “The data/datum”. I met a friend of mine, working for a leading information management firm using BO to prepare some reports on the customer response behaviour.  The way he was conversing with me showed up the fact that, he was seeing the data merely in terms of numbers and strings. This is something which i have seen with most of the people. I often ask them to look at the causal relationships among various KPIs because it can tell more about your business. Anyways, here is the list of trends seen in analytics:</p>
<p>1. <strong>Uplift modelling</strong>: The true effectiveness of a marketing campaign isn&#8217;t response rate! It&#8217;s the incremental impact &#8211; that is, additional revenue directly attributable to the campaign that would not otherwise have been generated. Yet traditional targeting criteria are often designed to find clients that are interested in the product, but would have bought it whether or not they received a promotion. In such cases, the incremental impact is insignificant and the marketing dollars could have been spent elsewhere.</p>
<p>Net Lift Models are designed to maximize incremental impact by targeting the undecided clients that can be motivated by marketing. These &#8220;swing customers&#8221; are akin to the swing states of a presidential election; data miners could learn a lot from presidential campaign<br />
More Here: <a href="http://www.predictiveanalyticsworld.com/sanfrancisco/2010/agenda.php#day2-2">http://www.predictiveanalyticsworld.com/sanfrancisco/2010/agenda.php#day2-2</a></p>
<p>2. <strong>Social Data Mining</strong>: There are many networking sites, there’s lot of data out there in the form of tweets, status messages, etc all of which have information. Be it a product related, customer feedback, complaints, oppurtunities, etc. Such data can prove to provide valuable insights about the subject under study.<br />
           In one of the blogs, Eric Siegel talks about interesting facts about social data analysis.<br />
(i)  Health care industry had identified that quitting smoking is contagious.<br />
(ii) Risk of obesity increases if you have a obese friend.<br />
                   So the above facts prove that social connections can reveal more predictive data about the customers.</p>
<p>3.<strong>Unstructured Data handling</strong>: IBM is working on a project called ‘Avatar’ offer users a mechanism to deal with unstructured data. Nearly 80% of data is unstructured in nature. Traditional BI tools are known to work best with structured data only. But practically most of the data is in the form of mails, documents, blogs,etc which is unstructured in nature. I hope unstructured data handling come to the commercial levels.</p>
<p>4. <strong>Real time BI</strong>: Now most of the mobile users are GPS enabled, due to its low price offering. This data about customer where-about information can bring out lot of interesting applications. Based on the rate of change of GPS location, we can ascertain the speed of movement of the user( based on this value, we can decide whether the customer is walking or using a vehicle). This data can help in traffic congestion management there by help the city authorities plan better. Analysis of GPS data might give insights on building systems which recommend routes based on current traffic conditions. It’s not just the only use, sky is the limit for the imagining creative ways of using GPS data. However, this raises privacy concerns as this data reveals confidential data about the customer behaviour. It is to be noted that we are in the stage where researchers are developing privacy-preserving data mining algorithms.  But still we have a long way to go.</p>
<p>                I was just thinking why companies don’t model the employee attrition as this may help in predicting the likely chances of employee planning for a job change and take preventive measure to retain him/her if their loss is significant. In fact i know companies which rate their employees during appraisal cycle on a scale of 1 to 5 which in turn decides salary and promotion, this rating is one of the strong predictors of attrition modelling. I promise to bring you more info about this subject as and when i get something interesting to blog about.</p>
<p>If you find my blog interesting, please subscribe here by entering your mail id in the right side subscribe box. Please feel free to comment and share your thoughts.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/prdeepakbabu.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/prdeepakbabu.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/prdeepakbabu.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/prdeepakbabu.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/prdeepakbabu.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/prdeepakbabu.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/prdeepakbabu.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/prdeepakbabu.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/prdeepakbabu.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/prdeepakbabu.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/prdeepakbabu.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/prdeepakbabu.wordpress.com/95/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/prdeepakbabu.wordpress.com/95/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/prdeepakbabu.wordpress.com/95/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=prdeepakbabu.wordpress.com&amp;blog=10482300&amp;post=95&amp;subd=prdeepakbabu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://prdeepakbabu.wordpress.com/2010/01/17/future-of-predictive-analytics-part-ii/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/069abd4e45db8cb332c15bf4020b64ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">prdeepakbabu</media:title>
		</media:content>
	</item>
	</channel>
</rss>
