Business Insights

Clustering is an unsupervised classification (learning) technique, where the objective is to maximize inter-cluster distance while minimizing the intra-cluster distance.  By unsupervised, we mean clustering or segmenting or classifying data based on all the available attributes and specifically there is no availability of class information. A supervised classification on other hand uses class information.
As usual, before we jump into ‘how’ let’s answer the ‘why’. Clustering is applied to solve variety of problems ranging from biological systems to using it for exploratory analysis of data ( as a pre-processing technique).  Many of the predictive analytics algorithms use clustering solutions as one of their components. It is used in all major brands for CRM, to understand their customer better. Another use of clustering is in outlier detection or fraud transaction identification.  If you have heard about a site called, it extensively works on clustering algorithms where the sites are segmented/clustered based on website attributes like category of domain, number of users, traffic, content type, corporate or personal, blog, image blog, video blog,etc. For example, if you entered INMOBI, you would get a list of companies which are in this space mainly its competitors – mojiva, Millenialmedia, Admob, Quattro, Mobclix,etc. If you are looking for image hosting site and want to know alternatives/options, this will be helpful.

We talk about similarity in terms of distance measures like

(i)                  Euclidean Distance

(ii)                Manhattan Distance

Read More


This blog post is about comparison of and in terms of similarities across dimensions of analytic maturity & use of data shared by their customers. As Thomas Davenport mentions in his book “Competing with analytics”, is one of the few companies which was built on the foundation of data, the so called “Analytically mature” company. LinkedIn has joined the list, with lot of new features available to their users.

As customers interact with the site, they generate data about their liking towards certain products or feature. Companies like and LinkedIn clearly understand how to leverage this information to make the interaction between the customer and the site even more valuable & relevant. Users who are ready to share more data with site about their likes/dislikes, the better would be the site’s recommendation for the user.  The companies need to instil this confidence in the customers mind, and hence have the users share data by will.

Read More

 In this blog post, i talk about 3 scenarios where there had been highly valuable insights derived, yet remaining simple.

1. Customers shopped online returned via stores Randy Lea, VP product & service marketing Teradata talks about one of their clients, who had tagged their e-com customers as best customers based on web sales they were generating and reaching out to them with various promotions. However, on integrating their web data with Enterprise data( store data) they found most of the customers were buying things online in multiple units and returning them through stores.

        For example, some customers brought 4-5 shirts of different colors, however they reatined one of them they liked the most and returned the rest of them visiting the stores. Effectively customers were buying through one channel(web) and returning them through another channel(store).Hence the web customers, whom they believed best not actually best rather average shoppers and shouldnt have been sent offers.

Source: Teradata ( Video) 

2. In the United States, if you live more than two miles from a pharmacy store, you probably don’t shop there!In the book data-drien marketing , Mark Jeffery talks about the case of how walgreens optimized their marketing spend using simple geo-spatial visualization. The pic on the right, is a picture of three stores of the Walgreens pharmacy chain on a map.Walgreens is a $59 billion annual revenue pharmacy company with 6,850 stores throughout the United States.

Source: "Data Driven Marketing" by Mark jeffery

Geo spatial visualization of Walgreens stores

This geospatial picture shows dots that are the customers and where they live and are coded by shape depending on which of the threeWalgreens stores they shop. The ‘‘diamond’’ customers shop at Store 1; the ‘‘square’’ customers, at Store 2; and the ‘‘star’’ customers, at Store 3. This pharmacy retail chain predominantly markets using flyers in newspapers. The way they pay for the marketing is by zip code, denoted by the dashed line, for example, in the picture. Mike Feldner, the marketing manager who first created these pictures, noticed something interesting: the circle on the picture is two miles in radius, and after looking at many pictures throughout the United States, he noticed that there are no dots (customers) for a store more than two miles from the store. He concluded that in the United States, if you live more than two miles from a pharmacy store, you probably don’t shop there. At that time,Walgreens treated each U.S. locale equally; allocating equal dollar amounts for newspaper advertising in each zip code across the United States. But the data show that if there is no store within two miles of the zip code, customers do not shop at the store. Based on these data, Walgreens ultimately stopped spending advertising dollars in all zip codes without a store within two miles of the zip code. As you might guess, the impact to sales revenues was exactly zero. The impact to marketing, however, was a cost saving of more than $5 million, for a total cost of collecting the data and creating the plots of approximately $200,000. This multimillion-dollar saving in marketing did not require a lot of money, and the analysis was done on a personal computer (PC). This is yet another example of being simple in approach, yet making the impact.

Source: “Data-Driven Marketing” by Mark Jeffery

3. We won because we understood the science of incentivizing people to cooperateLate last year the Pentagon’s mad-scientist research wing, Darpa, announced the Network Challenge, a $40,000 prize for the first group to find and report the locations of ten red weather balloons that the agency would set aloft one day in secret locations around the country. Most of the thousands of groups that signed up quickly realized that crowdsourcing was the way to find the 8-foot spheres. So, naturally, they offered bounties to balloon hunters. But Pentland’s crew at MIT’s Human Dynamics Lab–part of the MIT Media Lab–took their crowd control a step further. “It was trivial for us to slap together the balloon thing,” says the 58-year-old Pentland. That’s because other groups’ tactics were based on guesswork, he argues. His were based on lessons learned through data-mining research. “We won because we understood the science of incentivizing people to cooperate.”

Read the entire article here: Mining Human Behavior at MIT


%d bloggers like this: