Next generation BI

Just another WordPress.com weblog

Future of Predictive analytics – Part II January 17, 2010

Here is the continuation to the article i had posted few days back <here>. I am back with some interesting info on recent advancements in the area of analytics. Before going on to the details, wanna share something basic – “The data/datum”. I met a friend of mine, working for a leading information management firm using BO to prepare some reports on the customer response behaviour.  The way he was conversing with me showed up the fact that, he was seeing the data merely in terms of numbers and strings. This is something which i have seen with most of the people. I often ask them to look at the causal relationships among various KPIs because it can tell more about your business. Anyways, here is the list of trends seen in analytics:

1. Uplift modelling: The true effectiveness of a marketing campaign isn’t response rate! It’s the incremental impact – that is, additional revenue directly attributable to the campaign that would not otherwise have been generated. Yet traditional targeting criteria are often designed to find clients that are interested in the product, but would have bought it whether or not they received a promotion. In such cases, the incremental impact is insignificant and the marketing dollars could have been spent elsewhere.

Net Lift Models are designed to maximize incremental impact by targeting the undecided clients that can be motivated by marketing. These “swing customers” are akin to the swing states of a presidential election; data miners could learn a lot from presidential campaign
More Here: http://www.predictiveanalyticsworld.com/sanfrancisco/2010/agenda.php#day2-2

2. Social Data Mining: There are many networking sites, there’s lot of data out there in the form of tweets, status messages, etc all of which have information. Be it a product related, customer feedback, complaints, oppurtunities, etc. Such data can prove to provide valuable insights about the subject under study.
           In one of the blogs, Eric Siegel talks about interesting facts about social data analysis.
(i)  Health care industry had identified that quitting smoking is contagious.
(ii) Risk of obesity increases if you have a obese friend.
                   So the above facts prove that social connections can reveal more predictive data about the customers.

3.Unstructured Data handling: IBM is working on a project called ‘Avatar’ offer users a mechanism to deal with unstructured data. Nearly 80% of data is unstructured in nature. Traditional BI tools are known to work best with structured data only. But practically most of the data is in the form of mails, documents, blogs,etc which is unstructured in nature. I hope unstructured data handling come to the commercial levels.

4. Real time BI: Now most of the mobile users are GPS enabled, due to its low price offering. This data about customer where-about information can bring out lot of interesting applications. Based on the rate of change of GPS location, we can ascertain the speed of movement of the user( based on this value, we can decide whether the customer is walking or using a vehicle). This data can help in traffic congestion management there by help the city authorities plan better. Analysis of GPS data might give insights on building systems which recommend routes based on current traffic conditions. It’s not just the only use, sky is the limit for the imagining creative ways of using GPS data. However, this raises privacy concerns as this data reveals confidential data about the customer behaviour. It is to be noted that we are in the stage where researchers are developing privacy-preserving data mining algorithms.  But still we have a long way to go.

                I was just thinking why companies don’t model the employee attrition as this may help in predicting the likely chances of employee planning for a job change and take preventive measure to retain him/her if their loss is significant. In fact i know companies which rate their employees during appraisal cycle on a scale of 1 to 5 which in turn decides salary and promotion, this rating is one of the strong predictors of attrition modelling. I promise to bring you more info about this subject as and when i get something interesting to blog about.

If you find my blog interesting, please subscribe here by entering your mail id in the right side subscribe box. Please feel free to comment and share your thoughts.

 

Informatica interview Questions – All in one December 28, 2009

Filed under: Data Integration Tool, ETL — prdeepakbabu @ 3:26 pm

Informatica recently unvieled its latest release informatica 9 with some major enhancements as compared to its earlier versions, believed to revolutionize data integration &  ETL market. I know lot of my friends hunting for informatica jobs. I have tried to compile all the informatica interview questions available from geekinterview.com into a single pdf document. I hope this greatly saves time as navigating from one question page to another page often consumed lots of our time. Please note that the data available in the document is “as-is” available in geekinterview.com.
  Feel free to share your ideas/concerns. More about me here

  • 398 Questions 
  • 363 Pages
  •  1.5 MB
    If you find my blog interesting, subscribe to the blog.
  • Here is the pdf download link – Download Here

     

    Visualization Techniques December 10, 2009

    Filed under: Analytics, Visualization — prdeepakbabu @ 4:29 pm
    Tags: , , , , ,

    Visualization is considered to be one of the valuable tools in data mining. Visual analysis helps to understand data with minimum effort. Using graphs/ 3D plots to visualize data is more effective way of understanding massive sets of data. Ofcourse, we have other ways of exploring data like for example using statistical functions namely mean which depicts the average value, standard deviation / variance which depicts the spread of data and correlation which depicts the relationship between attributes. Some of the classical visualization techniques include

    (i) Histograms (ii) Scatter plots (iii) Pie charts

                  A frequency histogram displays the distribution of values for attributes by dividing the possible values into bins and showing the number of objects/records falling in each bin. Scatter plot is a great way to visualize paired numeric attributes. For example, you have two attributes height and weight. Scatter plot can visually represent the correlation of height wrt weight. It may indicate facts like “As the height increases, weight also increases” or “As the height increases, weight decreases”. Data mining techniques use scatter plots to identify redundant attributes which can be dropped from analysis.
                 Newer visualization techniques are evolving with growing business needs and need to minimize efforts for decision making. Scientists use visual analysis to explore previously unknown patterns in their research/simulation data. Hence visualization has gained wide acceptance across all spheres of life. Some visualization techniques which I really admire about are the geo-spatial visualization and word clouds.

    • Geo Spatial Data: Consider the average energy consumption per person data for various regions of the world. In this scenario, it is more meaningful to visualize the energy data against the geography to get some quick facts about data. The figure below shows the visualization. The bigger the circles and the darker the shade, indicates that energy consumption is high. On contrary, lighter shade with smaller circles indicate relatively low energy consumption.

      geo spatial Data Analysis : Energy consumption data

     

     


    •  

     

     

     

     

    • Word Cloud Analysis: I am sure, you must have seen this visualization when you visited any site like torrentz, pdf-geni, rapidsearch,etc. A word cloud indicates the frequency of word usage as a function of font/ color of the text. Bigger the font and darker the shade, the more frequent it appears in a given data set. The figure below shows the word cloud analysis of Lincoln’s speech. As seen from the visual, Lincoln more frequently used words like people, government, constitution, etc as these are relatively bigger in font. The figure on the right shows the word cloud analysis for this blog. There are some online tools to get the word cloud constructed for you blog/document – http://tagcrowd.com/

    Word Cloud Visualization

    Word cloud analysis of this blog

    There is a classical example where visualization technique helped identify the reason for cholera outbreak in London. On visualizing the chorela affected houses, it was evident that those people nearer to a pond had developed cholera and those away from this pond had lesser probability. On inspection of the water samples collected from the pond, it showed contamination. Hence using visualization a major problem was solved.
                          In a recent article published in leading magazine, i had learnt that expert systems(AI) are visually repesenting the probable reason for health problem by highlighting the affected organ(s) in 3 dimensions with ability to drill down to the microscopic levels( based on medical test data). All these facts, prove visualization to be effective tools.
                       Please add your suggestions(if any) in the comments section. If you find this blog interesting, please subcribe to this blog.

     

    amazon.com – An analytics perspective November 23, 2009

    Filed under: Analytics — prdeepakbabu @ 5:27 pm

    In my today’s post i will be talking about amazon.com, a leading e-commerce company. Amazon.com has huge amounts of data about its customer base, products and customer purchase behaviour. To boost up sales, amazon.com uses heavy analytics on the so collected TBs of data. If you searched for a book lets say “harry potter”. You would end up getting the following sections:

    • Frequently brought together
    • Customers who bought this also bought the following books
    • What do customers buy ultimately buy after viewing this product?

    All the above are examples of recommendation system. A Recommender system attemps to present the item(s) of interest to a particular user, there by helping to make strategic marketing decisions. Basically the user data is profiled and grouped into clusters namely high/low revenue generating customers, users interested in music, movies, science,etc. Hence a user may be presented with context based contents. This customization approach puts the customer at ease.
                   Hence you may not be surprised to see, when a 10-yr old boy logs in to amazon.com to purchase a book  he may be presented with video game ads in the side bar while if a 50 yr old man login, he may be presented with an ad for walking stick,etc. This increases the probability of customer making more purchases.
                  A recommender system helps the customer in making better decisions. Also it helps the companies in optimizing the markting costs. For example: email costs for promoting products  could be reduced by mailing relevant product info to the right consumer, instead of sending the mails to all the users, there by spamming the users.

     

    Future of Predictive Analytics November 14, 2009

    Filed under: Analytics — prdeepakbabu @ 3:37 am
    Tags: ,
    Recently i had attended a conference on “Future of Predictive Analytics” here at Bangalore. Here is the summarized version of the topics covered; Data mining is an automated process of discovering hidden patterns/trends in data using statistical and mathematical techniques. Data Mining is an Academicians term while “Predictive analytics” is the equivalent term used by an business analyst/business professional. The motivation for predictive analytics are reduced storage costs, ease of availibility of data capture techniques and growing complexity of the business.Here are some of the interesting applications of data mining:
    • Sentiment Analysis: It is sometimes referred as “Opinion Mining“. There are lot of unstructured data/information available in the form of blogs, social networking sites, emails & Documents. For example, feedback/comments about a newly released product may be available in the form of tweets in case of twitter micro blogging or as comments in a blog.By extracting this unstructured information and running NLP algorithms on this data can give the company valuable insight about  “how well is the product accepted among users?” and “what are the positives/negatives seen by the users about my product?“.This is a very challenging problem, as NLP stands to be the highly researched topics with not much signifcant achievement due to highly dynamic language semantics.
    • Audio/Video mining: The voice logs of call center/service desk generate huge amount of data.Manually listening to each of them and infering conclusions is humanly impossible. Hence automated means are necessary to programatically infer the underlying feedback provided by its customers. The challenges here are the variation in the human voice, pronunciation and accent.For example: some individuals pronounce sci-fi as ski-fi. 
      camtvspk

      High resolution camera in a restaurant to record customer's facial expressions

            Recently i read a newspaper article about a restaurant in US using high resolution cameras to record customer facial expressions. On analyzing this video data, some valuable feedback can be obtained about the food being served, which can be used to make strategic decisions.

    • Visualization: Visualization is another technique of analyzing spatial data.
      ap_election_post

      Visulization of US elections campaigning cost

      US elections campaign cost data could be visualized broken down by cities and zip codes. Google Maps provides APIs to integrate the geo-spatial data. As the complexity of data increases, newer visualization techniques evolve.

    • Real time BI: Walmart, the biggest retail chain in US,uses real time decision making to promote sales of products across various parts of the world. It is known to have the state of the art of Trickle feed systems and data warehouse.

    My next post would cover many more industry applications of data mining and its concepts. Feel free to comment about various other applications of analytics which you would have come across.