Tuesday, 11 October 2011

Architecture for Data Mining

To best apply these advanced techniques, they must be fully integrated with a data warehouse as well as flexible interactive business analysis tools. Many data mining tools currently operate outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data. Furthermore, when new insights require operational implementation, integration with the warehouse simplifies the application of results from data mining. The resulting analytic data warehouse can be applied to improve business processes throughout the organization, in areas such as promotional campaign management, fraud detection, new product rollout, and so on.
  Integrated Data Mining Architecture



The ideal starting point is a data warehouse containing a combination of internal data tracking all customer contact coupled with external market data about competitor activity. Background information on potential customers also provides an excellent basis for prospecting. This warehouse can be implemented in a variety of relational database systems: Sybase, Oracle, Redbrick, and so on, and should be optimized for flexible and fast data access.
An OLAP (On-Line Analytical Processing) server enables a more sophisticated end-user business model to be applied when navigating the data warehouse. The multidimensional structures allow the user to analyze the data as they want to view their business – summarizing by product line, region, and other key perspectives of their business. The Data Mining Server must be integrated with the data warehouse and the OLAP server to embed ROI-focused business analysis directly into this infrastructure. An advanced, process-centric metadata template defines the data mining objectives for specific business issues like campaign management, prospecting, and promotion optimization. Integration with the data warehouse enables operational decisions to be directly implemented and tracked. As the warehouse grows with new decisions and results, the organization can continually mine the best practices and apply them to future decisions.
This design represents a fundamental shift from conventional decision support systems. Rather than simply delivering data to the end user through query and reporting software, the Advanced Analysis Server applies users’ business models directly to the warehouse and returns a proactive analysis of the most relevant information. These results enhance the metadata in the OLAP Server by providing a dynamic metadata layer that represents a distilled view of the data. Reporting, visualization, and other analysis tools can then be applied to plan future actions and confirm the impact of those plans.

What can data mining do?

Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data.
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual's purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.
For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures.
WalMart is pioneering massive data mining to transform its supplier relationships. WalMart captures point-of-sale transactions from over 2,900 stores in 6 countries and continuously transmits this data to its massive 7.5 terabyte Teradata data warehouse. WalMart allows more than 3,500 suppliers, to access data on their products and perform data analyses. These suppliers use this data to identify customer buying patterns at the store display level. They use this information to manage local store inventory and identify new merchandising opportunities. In 1995, WalMart computers processed over 1 million complex data queries.
The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game.
By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense and then finds Williams for an open jump shot.
Some typical problems are categorized in the book Data Mining with SQL Server 2008
·        Recommendation generation After a customer chooses one or more products, data mining suggests another product.
·        Anomaly detection Commonly, fraud detection in the financial industry means looking for that one transaction or one customer among thousands who might be committing fraud.  Data mining can find a single observation among even the millions which might be different.
·        Churn analysis The term churn refers to losing a repeat customer or client, and knowing what early indicators might indicate someone is ready to switch can be important.
·        Risk management Credit ratings are often based on multivariate formulas which help predict levels of risk.
·        Customer segmentation Grouping customers or clients together, even by their own self-determined characteristics, can allow large organizations to manage marketing campaigns or even just organize their service professionals around similar groupings.
·        Targeted ads Marketers use data mining to deliver customized ads online, but organizations always want to know how to tailor any communications to be based on what they already know about their customers or clients.
·        Forecasting Time-series analysis takes data from the past, and provides a look into the future, even when there are seasonal increases or declines.
What technological infrastructure is required?
Today, data mining applications are available on all size systems for mainframe, client/server, and PC platforms. System prices range from several thousand dollars for the smallest applications up to $1 million a terabyte for the largest. Enterprise-wide applications generally range in size from 10 gigabytes to over 11 terabytes. NCR has the capacity to deliver applications exceeding 100 terabytes. There are two critical technological drivers:
  • Size of the database: the more data being processed and maintained, the more powerful the system required.
  • Query complexity: the more complex the queries and the greater the number of queries being processed, the more powerful the system required.
Relational database storage and management technology is adequate for many data mining applications less than 50 gigabytes. However, this infrastructure needs to be significantly enhanced to support larger applications. Some vendors have added extensive indexing capabilities to improve query performance. Others use new hardware architectures such as Massively Parallel Processors (MPP) to achieve order-of-magnitude improvements in query time. For example, MPP systems from NCR link hundreds of high-speed Pentium processors to achieve performance levels exceeding those of the largest supercomputers.

Architecture for Data Mining

Introduction to Operations Research

History of OR Operations Research
      The name (OR) probably came from a programme under taken by Great Britain during world war II “research in military operations”.
      In 1885, Ferderick W. Taylor gave importance to the application of scientific analysis to methods of production.
      Firstly , Operations Researches applied on the military defence of the Great Britain.
      Because of the success of OR in military operations, it quickly spread to all phases of industry and Government.
      In 1951, OR had taken its place as a distinct science in the United States.
      After that , OR was considered as the “art of winning war without actually fighting it”.

Operations Research in INDIA
      In India OR society founded and became a member of international Federation of OR societies in 1959.
      The Journal OPSEARCH was published for the first time in 1963.
      Some of the Indian organisations, those are using OR techniques are as follows :
      Indian Airlines
      Defence Organisations
      TATA
      TELCO

Definition of Operations Research
      Operations Research is scientific approach to problem solving for executive management.
      It can also be defined as : OR is an aid for the executive in making his decision by providing him with the needed quantitative information based on the scientific method of analysis.
      In other simple words “Operations research is the art of giving bad answers to problems which otherwise have worse answers”.
Features of Operations Research
1.      System orientation :  OR study the situation or problem as a whole. This means that any action or activity has same effect on the other part of the organization. The optimum result of one part of a system may not be the optimum for some  other part. Therefore to evaluate any decision one must identify all possible interactions and derermine their impact on the organization as a whole.
2.      Inter-Disciplinary team approach: OR is inter-disciplinary in nature, and is performed by a team of scientists whose individual members have been drawn from different scientific and engineering disciplines. No single individual can have a through knowledge of all aspects of the undertaking. Managerial problems have economic, physical, psychological, biological, sociological and engineering aspects.
3.      Scientific Approach :  OR uses the scientific methods to solve the problems. Most of the science studies such as Chemistry, Physics , biology etc can be carried out in the laboratories without much interference from the outside world.
4.      Decision Making :  OR increases the effectiveness of a management decisions. Management is most of the time making decisions. It is thus a decision science which helps management to make better decisions. It is thus a decision science which helps management to make better decisions. So, the major premise of OR is the decision making, irrespective of the situation involved. So decision making is a systematic process.
5.      Use of computer :  Operations research often requires a computer to solve the complex mathematical model or to perform a large number of computations that are involved.
6.      Objectives :  Operations research always attempts to find the best and optimal solution to the problem. For this purpose objectives of the organization are defined and analysed. These objectives are then used as the basis to compare the alternative courses of action.
7.      Quantitative solution : Operations research provides the managers with a quantitative basis for decision making. OR attempts to provide systematic and rational approach for quantitative solution to the various managerial problems.
8.      Human factors : In deriving quantitative solution we do not consider human factors, which doubtlessly play a great role in the problems. So study of the OR is incomplete without a study of human factors.

Types of Operations Research Models :
A model is a representation of the reality. Most of our thinking of operations research in business take place in the context of models. A model is general term denoting any idealized representation or abstraction of a real life system or situation. The objective of the model is not to indentify all aspects of the situation but to identify significant factors and their inter relationship. A model can be helpful in decision making as it provides a simplified description of complexities and uncertainties of a problem in hand in logical structure.
A broad classification of operations research models is given below :-
(a)   Physical Models :  These models include all forms of diagrams, drawings graphs and charts. Most of which are designed to deal with specific types of problems. By presenting significant factors and inter-relationships in pictorial term, physical models are able to indicate problem in a manner that facilitates analysis. For example a BAR chart can be used effectively as a summary presentation of a company’s monthly production forecast.
(i)                 Iconic models : An icon is an image or likeness of an object or system it represents. An iconic model the least abstract physical replica of a system, is usually based on a smaller scale than the original. The range of management usually based on a smaller scale than the original. The range of management problem areas where these models can be used effectively in extremely narrow. However, it consists largely of these fields that are oriented towards engineering to train their pilots and members of the crew.
(ii)              Analog models : Analog models are closely associated with iconic models. However they are not replicas of problem situations. Rather, they are small physical systems that have similar characteristics and work like an object or system it represents. E.g. children’s toys model of rail, roads etc.
(b)   Mathematical Model or Symbolic model : symbolic model employ a set of mathematical symbols to represent the decision variable of the system under study. These variables are related together by mathematical equations. In other words equations are mathematical model commonly used in operations research. Simple demand curve in economics is a symbolic model predicting buyer’s behavior at different price levels. Similarly profit and loss statement and budget for next year and both symbolic models. Profit and loss account is just reproduced on one sheet of paper and it summarizes the result of a year but does not recreates every action, which took place during the year.
(c)    By Nature environment
(i)                 Determine models : In this model everything is defined and the results are certain. Certainty is the state of nature assumed in these models. For any given input variable there shall be same output variables e.g. in EOQ models, one can easily determine economic lot size, one can apply sensitivity analysis, where change of one variable shall cause certain change in the outcome.
(ii)               Probabilistic model : In cases of risk and uncertainty , the input and output variables take the form of probability distribution. In fact such models are semi-closed model and reveal the probability of occurrence of an event. In fact they reveal the complexity of the real world and uncertainty prevailing in it.
(D)    By the extent of generality :
(i)  General models : General model is one which does not apply one situation only rather it has got general applications. For example linear programming model is known as a general model since it can be used for product mix, production scheduling, marketing, transportation, assignment problems.
(ii)  Specific Model : Specific model is applicable under specific condition only e.g. sales response curve or equation as a function of advertising is applicable in the marketing function alone.

Applications of Operations Research / Scope of Operations Research
In the field of industrial management, there is a chain of problems starting from the purchase of raw materials to the dispatch of finished goods. The management is interested in having an overall view of the method of optimizing profits. In order to take decision on scientific basis, operation research team will have to consider various alternative methods of producing the goods and the return in each case. Operation research study should also point out the possible changes in the overall structure like installation of a new machine, introduction of more automation, etc. OR has been successfully applied in industry, in the field of production, repair and maintenance, scheduling and sequencing, planning, scheduling and control of projects and scores of other associated areas.
Operation research approach is equally applicable to big and small organizations. For example, whenever a departmental store faces a problem like employing additional sales girls, purchasing an additional van etc, techniques of operation research can be applied to minimize cost and maximize benefit  for each such decision

1.      Finance, Budgeting and Investments :  Cost flow analysis , long-range capital requirements, investment portfolios, dividend policies etc.
2.      Purchasing procurement and Exploration : (a) Determining the quantity and timing of purchase of raw materials, machinery etc. (b) Rules of buying and supplies under varying prices. (c) Bidding policies (d) Equipment experiment policies. (e) Determination of quantities and timings of purchase. (f) Strategies for exploration and exploitation of new material sources.
3.      Production Management :
(i)                  Project Planning : (a) Location and Size of warehouse , distribution centers , retail outlets etc.
(b) Distribution Policy
(ii)   Manufacturing and Facility Planning :
(a) Production scheduling and sequencing.
(b) Project scheduling and allocation of resources
(c) Selection and location of factories, warehouses and their sizes
(d) Determining the optimal production Mix
(e) Maintenance policies and preventive maintenance
(f) Maintenance crew sizes.
(g) scheduling arid sequencing the production run by proper allocation of machines.
4. Marketing Management : (a) Product selection, timing, competitive actions.
(b) Advertising strategy and choice of different media of advertising.
(c) Number of salesmen, frequency of calling of accounts. Etc.
(d) Effectiveness of market research.
(e) Size of the stock and meet the future demand.

5. Personnel Management :  (a) Recruitment policies and assignment of jobs
(b) selection of suitable personnel with due consideration for age and skills.
(c) Establishing equitable bonus systems.
6. Research and Development :  (a) Determination of areas of concentration of research and development
(b) Reliability and evaluation of alternatives designs.
(c) Control of development projects.
(d) Co-Ordination of multiple research projects
(e) Determination of time and cost requirements.

Methodology of Operations Research : Given that OR represents an integrated framework to help make decisions, it is important to have a clear understanding of this framework so that  it can be applied to a generic problem. The approach or methodology of O.R. comprises the following seven consequential steps.
(a)   Orientation :  The first step in the O.R. approach is referred to as problem orientation. The primary objective of this step is to have a clean picture of the relevant issues. This phase also involves a study of documents and literature relevant to the problems.
(b)   Problem Definition : This is the second and most important and difficult step of O.R. process. The objective here is to have a clear definition of the problem in terms of its scope and the results desired.
(c)    Data collection : In the third phase of the O.R. process data is collected with the objective of translating the problem defined in the second phase into a model that can then be objectively analysed. Data typically comes from two sources. Data collection can have an important effect on the previous steps mentioned. Now we have ready access to data that was previously very hard to obtain. Even though the data is all present somewhere and in some form, extracting useful information from these sources is often very difficult.
(d)   Model formulation : In this fourth phase of O.R. process, modeling is the process of defining characteristics of all operations. A model may be defined formally as a selective abstraction of reality which helps in the working of original system.
(e)   Solution Results : After formulating an appropriate model, collection of data, the next stage is to find the solution to the model and then interpreting the solution. In this stage, we arrive at the answer to the variable and the objective function.
(f)     Analysis and interpretation of results : This stage requires determining whether the model can adequately and reliably predict the behavior of the real system. It involves testing the structural assumptions of the model. Analysis and interpretation of the performance of the model gives no assurance that the future performance of the system will continue to be in the same manner as in the past.
(g)   Implementation and Monitoring : It is the last process of implementing solution in the organization, implementation of solution is often more difficult as it does not assures that the solution obtained would automatically implemented. The impact of decision may effect various segments of the organization which needs proper monitoring the various changes and taking the suitable measures.


Techniques of Operations Research :

1.      Linear Programming : Linear programming techniques solve product mix and distribution problems of enterprises. LPP techniques used to allocate scarce resources in an optimum manner in problems of scheduling product mix etc.
2.      Inventory Control Models : Inventory may be defined as a useful idle resource which has economic value by raw materials, spare parts, finished products etc.
3.      Sequencing : Models have been developed to find a sequence for processing jobs so that the total elapsed time for all the jobs will be minimum.
4.      Assignment Problems : It deals in allocating the various resources or items to various activities on a one to one basis in such a way that the time or cost involved in minimizes one; sale or profit is maximized e.g. Manager may like to know which job should be assigned to which person so that all jobs can be completed in the shortest possible time.
5.      Transportation Problems : Transportation problem deals with the transportation of a product from a number of sources with limited supplies. The main objective of transportation is to schedule shipments from sources to destination in such a way as to minimize the total transportation cost.
6.      Decision Theory and Games Theory : Decision theory is primarily concerned with decision making under the conditions of risk and uncertainty
7.      Dynamic Programming : When we are faced with the problem of multi faced solution which are inter-related and almost similar in nature, differing any in time and space like solution of suitable product mix, inventory planning control.


Limitations  of Operations Research :

The previous sections have brought out the positive side of O.R. only. However there is also the need to point out the negative side.
1.      Magnitude of Computations :  O.R. tries to find out optimal solution taking into account all the factors. In the modern society these factors are enormous and expressing them in quantity and establishing relationships among these required complicated calculations which can only be handled by machine.
2.      Non – Quantifiable Factors : O.R. provides solution only when all elements related to a problem can be quantified. All relevant variables do not lend themselves to quantifications.
3.      Gap between manager and operations researcher : O.R. being specialist’s job requires a mathematician or stastician.
4.      Money and time costs : when the basic data are subjected to frequent changes, incorporation them into the O.R. models is a costly affair. Moreover, a fairly good solution at present may be more desirable than a perfect O.R. solution available after conventional thinking.
5.      Implementation : Implementation of decisions is a delicate task. It must take into account the complexities of human relations and behavior.
6.      Selection of Technique : Operations research techniques are very usefull but they cannot be used indiscriminately. Choice of technique depends upon the nature of problem.
7.      Not a substitute of Management : Operations research only provides the tools and cannot be a substitute of management. It only examines the results of alternative courses of action and final decision is made by management within its authority and judgement.