Stock Market Movement

The prediction of stock market movement has been an issue of interest for centuries. Despite years of study and the latest technology, it seems that no method has been discovered that consistently works. Previous attempts at solving this problem have employed a wide of array tools and strategies. Among these, many utilized some type of neural network in an attempt to learn from historical data and apply those findings to predict current data [12] [20] [33]. Other approaches use technical analysis to locate patterns in data that suggest certain movement trends [4].

These patterns in the data are constructed by conforming to a series of rules based on the stock price and volume of trading over a series of days. Technical analysis is based on the assumption that previous market data reflects all of the relevant information about a stock and therefore predictions can be made from this. Still another method, known as fundamental analysis, looks at a business' financial statements, competitors, and business strategies in addition to evaluating historical data in an effort to make a prediction.

The goal of fundamental analysis is to determine the value of a stock based on the previously mentioned factors and to act on the assumption that the actual stock price will eventually reflect the determined value. As a result, fundamental analysis usually works best over longer periods of time, whereas technical analysis is more suitable for short term trading [28]. Neural networks offer the ability to process large amounts of data quickly and retain information learned from that data to be used again in the future. Technical analysis provides a slightly more concrete method of evaluation at the cost of locating and implementing the algorithms yourself. Fundamental analysis seems optimal as it utilizes the largest information set and has many well known practitioners such as Warren Buffett and Peter Lynch, but is more difficult to implement via an Artificial Intelligence system mainly because such information is not often publicly available. Furthermore, assessment techniques for fundamental analysis are far more complicated and difficult to automate. Lastly, gains realized by fundamental analysis usually take years to mature fully [28], so comparative research studies would not be feasible.

Stock price forecasting is an important task for investment/financial decision making challenge. It receives considerable attention from both researches and practitioners. Stock market is highly volatile, complex and dynamic area so stock/price forecasting is a considerable challenging issue. Several approaches have been used for forecasting stock price such as traditional and fundamental methods. But these previous methods have limits and not completely capable to provide accuracy. Data mining and computational intelligence techniques for resolving the problems of stock / future price forecasting have become rapidly growing alternative methods for achieving considerable degree of accuracy.

The difficulty with technical analysis is that a complete pattern is required to make an accurate prediction on the stock movement [28]. Ideally, such a prediction should be made before the pattern was completed to enable prediction. For this task implement a Time-Delay Artificial Neural Network, called Midas. Like Metis, Midas takes input data including open, close, high and low prices per day for a particular stock ticker over a period of time, along with the corresponding trading volume for each day. This data is sufficient for implementing technical analysis through Metis and is therefore used as the input data for Midas as well.

Once data is preprocessed by Metis, locating all relevant technical indicators, it is passed to Midas as training data. This data is useful for training because we use the technical indicators to highlight areas in the data where Midas should be able to make a prediction. The goal of this process is to train Midas such that it can make these predictions before the technical indicator patterns are completed. The predictions made by Midas represent a predicted direction of movement at a given time interval. As such, it is very easy to check Midas' accuracy by simply running it on older data so we can see whether its predictions were correct.


Past approaches to this problem first applied an artificial neural network directly to historical stock data using linear time series modeling [8]. This produced results which over fitted the training data and therefore rendered them unusable in real trading. In addition to omitting any preprocessing, the neural networks used only contained two layers, an input and an output layer. These linear techniques are now known to be provably insufficient for any nonlinear phenomenon including stock price movement.

However, a multi-layer feed forward neural network (also known as a Multilayer Perceptron or MLP) which contains a hidden layer with a non-linear transfer function should be able to model any deterministic phenomenon given a correct and sufficient training set, enough neurons, and enough training time (Masters, 1993). Many modern attempts to predict stock movement use MLPs.

One such attempt is by Chenowith [4] they relied on a single technical indicator called the average direction index (ADX), which identifies and quantifies trends by averaging the fraction of today's range above or below the previous day's range. The ADX is obtained through a feature selection component and used as input into two separate neural networks (Up and Down) whose results were then polled and applied to a rule base to predict the final market movement. The system traded on the S&P 500 and obtained a rate of return of 15.99% over a 5 year period with 54 trades versus a rate of return of 11.05% over a 5 year period using a buy and hold strategy. Although the result of this approach looks excellent on paper, the rule base followed by the neural net was to buy and hold if the output was within a large uncertainty threshold, implying that the actual predictive capability was limited. Without knowing the exact predictive accuracy, it is difficult to quantitatively compare the system, which inevitably includes rules for trading which may be the actual cause of the monetary gain achieved by the system rather than predictive accuracy.

Variants on the MLP theme are common, such as using a variety of swdifferent markets for training data [20] and using statistical analysis to assist in training [33]. Other approaches include using fuzzy neural networks, which is based on fuzzification, a technique that allows members of a set to have degrees of membership instead of binary membership as in classical set theory. The introduction of ambiguity allows data to be divided into several different technical indicators if necessary, hopefully producing more lenient and more accurate neural networks. However, results indicate that fuzzy neural networks have prediction accuracy virtually equivalent to classical neural networks [25].

Mahfoud and Mani [24] chose to use genetic algorithms to predict stock prices. Genetic algorithms are inspired by evolutionary biology and the concept of survival of the fittest. A large population of possible algorithmic representations for stock prediction is first created. Then, each member is executed and evaluated, keeping the algorithms which generate the best results and mixing their properties amongst other high scoring algorithms to obtain a new generation of algorithms in a Darwinian fashion. The process is repeated until the error has been reduced to an acceptable level. The system created by Mahfoud and Mani allows their genetic algorithms to abstain from making predictions while their neural networks make a prediction at virtually every time interval, requiring a higher degree of predictive accuracy in the neural networks to beat the genetic algorithms.


Data mining a field at the intersection of computer science and statistics is the process that attempts to discover patterns in large data sets. It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

Data mining derives its name from the similarities between searching for valuable information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find where the value resides.

Although data mining is still in its infancy, companies in a wide range of industries - including retail, finance, health care, manufacturing transportation, and aerospace - are already using data mining tools and techniques to take advantage of historical data. By using pattern recognition technologies and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed.

For businesses, data mining is used to discover patterns and relationships in the data in order to help make better business decisions. Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Specific uses of data mining include:

Market segmentation - Identify the common characteristics of customers who buy the same products from your company.
Customer churn - Predict which customers are likely to leave your company and go to a competitor.
Fraud detection - Identify which transactions are most likely to be fraudulent.
Direct marketing - Identify which prospects should be included in a mailing list to obtain the highest response rate.
Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing.
Market basket analysis - Understand what products or services are commonly purchased together; e.g., beer and diapers.
Trend analysis - Reveal the difference between typical customers this month and last.

Data mining technology can generate new business opportunities by:

Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in a large database. Questions that traditionally required extensive hands-on analysis can now be directly answered from the data. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.
Automated discovery of previously unknown patterns: Data mining tools sweep through databases and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.


The stock market is a complex, no stationary, chaotic and non-linear dynamic system. Forecasting stock market, currency exchange rate, bank bankruptcies, understanding and managing financial risk, trading futures, credit rating, loan management, bank customer profiling, and money laundering analyses are core financial
tasks for data mining [10]. Some of these tasks such as bank customer profiling have many similarities with data mining for customer profiling in other fields.

Stock market forecasting includes uncovering market trends, planning investment strategies, identifying the best time to purchase the stocks and what stocks to purchase. Financial institutions produce huge data sets that build a foundation for approaching these enormously complex and dynamic problems with data mining tools. Potential significant benefits of solving these problems motivated extensive research for years.

Application of Decision Tree in Stock Markets

Decision trees are excellent tools for making financial or number based decisions where a lot of complex information needs to be taken into account. They provide an effective structure in which alternative decisions and the implications of taking those decisions can be laid down and evaluated. They also help you to form an accurate, balanced picture of the risks and rewards that can result from a particular choice. In this section presents some of the application of decision trees in stock markets.

In a stock market, how to find right stocks and right timing to buy has been of great interest to investors. To achieve this objective, Muh-Cherng et al. present a stock trading method by combining the filter rule and the decision tree technique [27]. The filter rule, having been widely used by investors, is used to generate candidate trading points. These points are subsequently clustered and screened by the application of a decision tree algorithm. Compared to previous literature that applied such a combination technique, this research is distinct in incorporating the future information into the criteria for clustering the trading points. Taiwan and NASDAQ stock markets are used to justify the proposed method.

Listed companies' financial distress prediction is important to both listed companies and investors. Jie and Hui (2008) present a data mining method combining attribute-oriented induction, information gain, and decision tree, which is suitable for preprocessing financial data and constructing decision tree model for financial distress prediction [16]. On the basis of financial ratios attributes and one class attribute, adopting entropy-based discretization method, a data mining model for listed companies' financial distress prediction is designed. The empirical experiment with 35 financial ratios and 135 pairs of listed companies as initial samples got satisfying result, which testifies to the feasibility and validity of the proposed data mining method for listed companies' financial distress [16].

Accurately, forecasting stock prices has been extensively studied. Jar-Long and Shu-Hui (2006) provided a proposal to use a two-layer bias decision tree with technical indicators to create a decision rule that makes buy or not buy recommendations in the stock market [14]. A novel method designed for using two-layer bias decision tree to improve purchasing accuracy. Comparison with random purchases, the results indicate the system presented here not only has excellent out-of sample forecasting performance, but also delivers a significant improvement in investment returns for all listed companies. Additionally, the proposed system has few parameter requirements, stable learning, and fast learning speed. Increasingly, the system presented here has high accuracy given large amounts of varied test data, with testing periods that experienced structural change including both bull and bear markets. Based on all of the above, they believe the proposed bias decision model is very flexible, modular and easily understandable [14].

Chi-Lin Lu and Ta-Cheng Chen have employed decision tree-based mining techniques to explore the classification rules of information transparency levels of the listed firms in Taiwan's stock market [5]. The main purpose of their study is to explore the hidden knowledge of information disclosure status among the listed companies in Taiwan's stock market. Moreover, the multi-learner model constructed with decision tree algorithm has been applied. The numerical results have shown that the classification accuracy has been improved by using multi-leaner model in terms of less Type I and Type II errors. In particular, the extracted rules from the data mining approach can be developed
as a computer model for the prediction or classification of good poor information disclosure potential. By using the decision tree-based rule mining approach, the significant factors with the corresponding equality inequality and threshold values were decided simultaneously, so as to generate the decision rules.
Unlike many mining approaches applying neural networks related approaches in the literature, the decision tree approach is able to provide the explicit classification rules. Moreover, a multi-learner model constructed by boosting ensemble approach with decision tree algorithm has been used to enhance the accuracy rate in this work. Based on the extracted rules, a prediction model has then been built to discriminate good information disclosure data from the poor information disclosure data with great precision.

Moreover, the results of the experiment have shown that the classification model obtained by the multi-learner method has higher accuracy than those by a single decision tree model. Also, the multi-learner model has less Type I and Type II errors. It indicates that the multi-learner model is appropriate to elicit and represent experts' decision rules, and thus it has provided effective decision supports for judging the information disclosure problems in Taiwan's stock market. By using the rule based

decision models, investors and the public can accurately evaluate the corporate governance status in time to earn more profits from their investment. It has a great meaning to the investors, because only prompt information can help investors in correct investment decisions [16].

Application of Neural Network in Stock Markets

This section provides some of the application of neural network in stock markets. It is nowadays a common notion that vast amounts of capital are traded through the stock markets all around the world. National economies are strongly linked and heavily influenced by the performance of their stock markets. Moreover, recently the markets have
become a more accessible investment tool, not only for strategic investors but for common people as well.

Consequently they are not only related to macroeconomic parameters, but they influence everyday life in a more direct way. Therefore they constitute a mechanism which has important and direct social impacts. The characteristic that all stock markets have in common is the uncertainty, which is related to their short and long term future state. This feature is undesirable for the investor but it is also unavoidable whenever the stock market is selected as the investment tool. The best that one can do is to try to reduce this uncertainty. Stock market prediction is one of the instruments in this process.

The main advantage of neural networks is that they can approximate any nonlinear function to an arbitrary degree of accuracy with a suitable number of hidden units [10]. The development of powerful communication and trading facilities has enlarged the scope of selection for investors. David Enke and Suraphan Thawornwong introduced an information gain technique used in machine learning for data mining to evaluate the predictive relationships of numerous financial and economic variables [6]. Neural network models for level estimation and classification are then examined for their ability to provide an effective forecast of future values. A cross validation technique is also employed to improve the generalization ability of several models. The results show that the trading strategies guided by the classification models generate higher risk-adjusted profits than the buy-and-hold strategy, as well as those guided by the level-estimation based forecasts of the neural network and linear regression models [6]. Defu et al. (2007) dealt with the application of a well known neural network technique, multilayer back propagation (BP) neural network, in financial data mining. A modified neural network forecasting model is presented, and an intelligent mining system is developed. The system can forecast the buying and selling signs according to the prediction of future trends to stock market, and provide decision-making for stock investors.

The simulation result of seven years to Shanghai composite index shows that the return achieved by this mining system is about three times as large as that achieved by the buy-and-hold strategy, so it is advantageous to apply neural networks to forecast financial time series, so that the different investors could benefit from it [7]. Accurate volatility forecasting is the core task in the risk management in which various portfolios' pricing, hedging, and option strategies are exercised.

Tae (2007) proposes hybrid models with neural network and time series models for forecasting the volatility of stock price index in two view points: deviation and direction [35]. It demonstrates the utility of the hybrid model for volatility forecasting. This model demonstrates the utility of the neural network forecasting combined with time series analysis for the financial goods [35].

Application of Clustering in Stock Markets

As part of a stock market analysis and prediction system consisting of an expert system and clustering of stock prices, data is needed. Stock markets are recently triggering a growing interest in the physicists' community. Basaltoa et al. (2005) apply a pair wise clustering approach to the analysis of the Dow Jones index companies, in order to identify similar temporal behavior of the traded stock prices [2]. The objective of this attention is to understand the underlying dynamics which rules the companies' stock prices. In particular, it would be useful to find, inside a given stock market index, groups of companies sharing a similar temporal behavior. To this purpose, a clustering approach to the problem may represent a good strategy. To this end, the chaotic map clustering algorithm is used, where a map is associated to each company and the correlation coefficients of the financial time series to the coupling strengths between maps. The simulation of a chaotic map dynamics gives rise to a natural partition of the data, as companies belonging to the same industrial branch are often grouped together. The identification of clusters of companies of a given stock market index can be exploited in the portfolio optimization strategies [2].

Graph representation of the stock market data and interpretation of the properties of this graph gives a new insight into the internal structure of the stock market. Vladimar et al. (2006) study different characteristics of the market graph and their evolution over time and came to several interesting conclusions based on their analysis [38]. It turns out that the power-law structure of the market graph is quite stable over the considered time intervals; therefore one can say that the concept of self-organized networks, which was mentioned above, is applicable in finance, and in this sense the stock market can be considered as a 'self-organized' system. Another important result is the fact that the edge density of the market graph, as well as the maximum clique size, steadily increases during the last several years, which supports the well-known idea about the globalization of economy which has been widely discussed recently. They also indicate the natural way of dividing the set of financial instruments into groups of similar objects (clustering) by computing a clique partition of the market graph. This methodology can be extended by considering quasi-cliques in the partition, which may reduce the number of obtained clusters [38].

Stock prices tend to cluster at round numbers, a phenomenon observed in many markets. Using tick-by tick transaction data, Wataru (2006) studies price clustering on the Tokyo stock exchange, which is a computerized limit order market. As for the intraday pattern, the degree of price clustering is greatest at the market opening. Then, it decreases during the first half hour and reaches a stable level. It does not increase again near the market closing. There is no clear difference in clustering between call auctions and continuous auctions [40].

S??onia et al. (2008) have investigated the long memory and volatility clustering for the S&P 500, NASDAQ 100 and Stoxx 50 indexes in order to compare the US and European Markets. Additionally, they have compared the results from conditionally heteroscedastic models with those from the entropy measures. In the latter, they have examined Shannon entropy, Renyi entropy and Tsallis entropy. The results have corroborated the previous evidence of nonlinear dynamics in the time series considered [32].

The main goal of their study is to compare two different perspectives: the so-called traditional approach in which the authors have considered the GARCH (1,1), IGARCH (1,1) and FIGARCH (1,d,1) specifications and the econophysics approach based on the concept of entropy. For their purpose three variants of this notion were chosen: the Shannon, Renyi and Tsallis measures. The results from both perspectives have shown nonlinear dynamics in the volatility of S&P 500, NASDAQ 100 and Stoxx 50 indexes and must be understood in complementarily. They have considered that the concept of entropy can be of great help when analyzing stock market returns since it can capture the uncertainty and disorder of the time series without imposing any constraints on the theoretical probability distribution. By contrast, the ARCH/GARCH type models assume that all variables are independent and identically distributed (I.I.D.) [32].

Application of Association Rules in Stock Markets

As stated in Agrawal et al. (1993) discovering association rules is an important data mining problem, and there has been considerable research on using association rules in the field of data mining problems. The association ns' rules algorithm is used mainly to determine the relationships between items or features that occur synchronously in the database. For instance, if people who buy item X also buy item Y, there is a relationship between item X and item Y, and this information is useful for decision makers. Therefore, the main purpose of implementing the association rules algorithm is to find synchronous relationships by analyzing the random data and to use these relationships as a reference during decision making [1].

One of the most important problems in modern finance is finding efficient ways to summarize and visualize the stock market data to give individuals or institutions useful information about the market behavior for investment decisions. The enormous amount of valuable data generated by the stock market has attracted researchers to explore this problem domain using different methodologies.

Shu-Hsien et al. (2008) investigated stock market investment issues on Taiwan stock market using a two stage data mining approach. The first stage apriori algorithm is a methodology of association rules, which is implemented to mine knowledge and illustrate knowledge patterns and rules in order to propose stock category association and possible stock category investment collections. Then the K-means algorithm is a methodology of cluster analysis implemented to explore the stock cluster in order to mine stock category clusters for investment information. By doing so, they propose several possible Taiwan stock market portfolio alternatives under different circumstances [31].

Application of Factor Analysis in Stock Markets

Factor analysis is particularly useful in situations where a large number of variables are believed to be determined by a relatively few common causes of variation. Also, it should be particularly useful for analyzing financial markets because if financial markets are efficient, nominal returns will be affected by default and market risk and by expected inflation and inflation uncertainty. Michael Flad et al. provided an empirical analysis of the common factor driving the intraday movements of the DAX and the DJIA
during overlapping trading hours. Based on a minute-by-minute dataset spanning from March to December, 2003, they estimate a bivariate common factor model for the two indices. By explicitly modeling the two stock indices, they implicitly assume that news on economic fundamentals is aggregated in both equity market and that, therefore, both stock

indices are linked by a common trend of cumulated random information arrivals. They compute various measures of information leadership and find that the DJIA is the predominant source of price relevant information flowing into the transatlantic system of stock indices. Moreover, our impulse response analyses show that both stock markets adjust very quickly (in less than 5 min) to shocks emanating from either the U.S. or the German side, but that DJIA-innovations have a major long-run effect on the German stock market, whereas DAX-shocks are of minor importance. This observation is further strengthened by our permanent-transitory decomposition. Hence, their analysis implies that international economic news is first incorporated in the U.S. and then transferred to the German stock [26].

A large and growing body of empirical work is devoted to estimating the relation between risk and return in the U.S. stock market. Although theory typically predicts a positive relation, empirical findings are mixed and often suggest a negative relation. An important limitation of existing empirical work, however, pertains to the relatively small amount of conditioning information used to estimate the conditional mean and conditional volatility of excess stock market returns. In turn, the use of such sparse information sets in the construction of fitted moments that can translate into an omitted information bias in the estimated risk'return relation.

Sydney and Serena (2007) considered one approach to this omitted-information problem by employing a methodology for incorporating a large amount of conditioning information in their estimates of the conditional mean and conditional volatility of excess stock market returns. Recent research on dynamic factor models finds that the information in a large number of economic time series can be effectively summarized by a relatively small number of estimated factors, affording the opportunity to exploit a rich base of information more likely to span the information sets of financial market participants than in previous analyses. In doing so, their study contributes to the empirical literature by evaluating both the potential role of omitted information in the estimated risk'return relation as well as the robustness of previous results to conditioning on richer information sets [34].

Application of Time Series in Stock Market

It is obvious that forecasting activities play an important role in our daily life. The traditional statistical approaches for time series can predict problems arising from new trends, but fail to forecast the data with linguistic facts. Furthermore, the traditional time
series requires more historical data along with some assumptions like normality postulates. In recent years, many researchers have used fuzzy time series to handle forecasting problems. A number of researchers presented fuzzy time series forecasting models in the last 15 years [36].

Lee et al. (2006) presented two factor high-order fuzzy time series for forecasting daily temperature in Taipei and TAIFEX [22]. Jilani and Burney (2007a, b) and Jilani et al. (2007) presented new fuzzy metrics for high-order multivariate fuzzy time series forecasting for car road accident casualties in Belgium [22, 19, 18]. Yu proposed an appropriate approach to define the length of intervals during the formulation of fuzzy relationships and applied on TAIEX and enrollments forecasting problems [43]. Huarng and Yu (2005) proposed a Type-2 fuzzy time series model and applied to TAIEX forecasting problem. But their method requires extra observations to form type-2 fuzzy relations for each observation and thus require larger number of data than the conventional type-I methods [11].


GAs is commonly used as optimizers that adjust parameters to minimize or maximize some feedback measure, which can then be used independently or in the construction of an ANN. In the financial markets, genetic algorithms are most commonly used to find the best combination values of parameters in a trading rule, and they can be built into ANN models designed to pick stocks and identify trades.

In a genetic algorithm, a population of strings (called chromosomes or the genotype of the genome), which encode candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem, is evolved toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and modified (recombined and possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.

Genetic algorithms are created mathematically using vectors, which are quantities that have direction and magnitude. Parameters for each trading rule are represented with a one-dimensional vector that can be thought of as a chromosome in genetic terms. Meanwhile, the values used in each parameter can be thought of as genes, which are then modified using natural selection.


A decision tree is a classifier expressed as a recursive partition of the instance space. The decision tree consists of nodes that form a rooted tree, meaning it is a directed tree with a node called 'root' that has no incoming edges. All other nodes have exactly one incoming edge. A node with outgoing edges is called an internal or test node. All other nodes are called leaves (also known as terminal or decision nodes). In a decision tree, each internal node splits the instance space into two or more sub-spaces according to a certain discrete function of the input attributes values. In the simplest and most frequent case, each test considers a single attribute, such that the instance space is partitioned according to the attribute's value. In the case of numeric attributes, the condition refers to a range.


The thesis consists of six chapters.

Chapter 1 (Introduction) overviews some of the distinguishing features of data mining, the introduction and related work.
Chapter 2 (Theoretical Background) includes theoretical background that addresses about decision tree, genetic algorithm, its types and standards etc
Chapter 3 (Literature Survey) includes literature survey of project like various techniques and development related with topic.
Chapter 4 (Problem Statement & Proposed Methodology) contain problem statement and proposed framework along with algorithm.
Chapter 5 (Simulation & Result Analysis) contains simulation and result analysis with cooperation of result.
Chapter 6 (Conclusions) in this chapter we conclude our result.




Genetic Algorithms are a family of computational models inspired by evolution. These algorithms encode a potential solution to a specific problem on a simple chromosome-like data structure and apply recombination operators to these structures as to preserve critical information. Genetic algorithms are often viewed as function optimizer, although the ranges of problems to which genetic algorithms have been applied are quite broad. An implementation of genetic algorithm begins with a population of (typically random) chromosomes. One then evaluates these structures and allocated reproductive opportunities in such a way that these chromosomes which represent a better solution to the target problem are given more chances to `reproduce' than those chromosomes which are poorer solutions. The 'goodness' of a solution is typically defined with respect to the current population.

2.1.1 Background

Many human inventions were inspired by nature. An artificial neural network is one example. Another example is Genetic Algorithms (GA). GAs search by simulating evolution, starting from an initial set of solutions or hypotheses, and generating successive "generations" of solutions. This particular branch of AI was inspired by the way living things evolved into more successful organisms in nature.

A chromosome is a long, complicated thread of DNA (deoxyribonucleic acid). Hereditary factors that determine particular traits of an individual are strung along the length of these chromosomes, like beads on a necklace. Each trait is coded by some combination of DNA (there are four bases, A (Adenine), C (Cytosine), T (Thymine) and G (Guanine). Like an alphabet in a language, meaningful combinations of the bases produce specific instructions to the cell. Changes occur during reproduction. The chromosomes from the parents exchange randomly by a process called crossover. Therefore, the offspring exhibit some traits of the father and some traits of the mother. A rarer process called mutation also changes some traits. Sometimes an error may occur during copying of chromosomes (mitosis).
The parent cell may have -A-C-G-C-T- but an accident may occur and changes the new cell to -A-C-T-C-T-. Much like a typist copying a book, sometimes a few mistakes are made. Usually this results in a nonsensical word and the cell does not survive. But over millions of years, sometimes the accidental mistake produces a more beautiful phrase for the book, thus producing a better species.

2.1.2 Genetic Algorithm Vocabulary

Common terms that are related with the genetic algorithm terminology are following:

Table 2.1 Terminology of Genetic Algorithms

Genetic Algorithms Explanation
Chromosome(String, individual) Solution (coding)
Genes (bits) Part of solution
Locus Position of gene
Alleles Values of gene
Phenotype Decoded solution
Genotype Encoded solution


2.2.1 Concepts

Genetic Algorithms are search algorithms that are based on concepts of natural selection and natural genetics. Genetic algorithm was developed to simulate some of the processes observed in natural evolution, a process that operates on chromosomes (organic devices for encoding the structure of living being). The genetic algorithm differs from other search methods in that it searches among a population of points, and works with a coding of parameter set, rather than the parameter values themselves. It also uses objective function information without any gradient information. The transition scheme of the genetic algorithm is probabilistic, whereas traditional methods use gradient information. Because of these features of genetic algorithm, they are used as general purpose optimization algorithm. They also provide means to search irregular space and hence areapplied to a variety of function optimization, parameter estimation and machine learning applications.

2.2.2 Basic Principle

The working principle of a canonical GA is illustrated below (algorithm). The major steps involved are the generation of a population of solutions, finding the objective function and fitness function and the application of genetic operators. These aspects are described briefly below. They are described in detail in the following subsection.

/*Algorithm GA */
Formulate initial population
Randomly initialize population
Evaluate objective function
Find fitness function
Apply genetic operators
Until stopping criteria

Fig 2.1 The basic GA operations: One generation is broken down into a selection phase and recombination phase. Strings are assigned into adjacent slots during selection.

An important characteristic of genetic algorithm is the coding of variables that describes the problem. The most common coding method is to transform the variables to a binary string or vector; GAs performs best when solution vectors are binary. If the problem has more than one variable, a multi-variable coding is constructed by concatenating as many single variables coding as the number of variables in the problem.

Genetic Algorithm processes a number of solutions simultaneously. Hence, in the first step a population having P individuals is generated by pseudo random generators whose individuals represent a feasible solution. This is a representation of solution vector in a solution space and is called initial solution. This ensures the search to be robust and unbiased, as it starts from wide range of points in the solution space.

In the next step, individual members of the population are evaluated to find the objective function value. In this step, the exterior penalty function method is utilized to transform a constrained optimization problem to an unconstrained one. This is exclusively problem specific. In the third step, the objective function is mapped into a fitness function that computes a fitness value for each member of the population. This is followed by the application of GA operators.

2.2.3 Working Principle

To illustrate the working principles of GAs, an unconstrained optimization problem is considered. Let us consider following maximization problem,

' Maximize f(x),x'_i^1'x_i'x_i^u ,i=1,2,''.N (2.1)

Where, ' x'_i^1 and x_i^u are the lower and upper bound the variable xi can take. Although a maximization problem is considered here, a maximization problem can also be handled using GAs. The working of GAs is completed by performing the following tasks.

2.2.4 Coding

In order to use GAs to solve the above problem (equation 1), variables xi's are first coded in some string structures. It is important to mention here that the coding of the variables is not absolutely necessary. There exist some studies where GAs are directly used on the variables themselves, but here these exceptions are ignored and the working principles of a simple genetic algorithm is discussed

Fig 2.2 Coding in GA

Binary-coded strings having 1's and 0's are mostly used. The length of the string is usually determined according to the desired solution accuracy. For example, if four bits are used to code each variable in a two-variable optimization problem, the strings (0000 0000) and (1111 1111) would represent the points ''(x'_1^l,x_2^l)'^T '(x_( 1 ,)^u x_2^u)'^T
respectively, because the sub-strings (0000) and (1111) have the minimum and the maximum decoded values. Any other eight bit string can be found to represent a point in the search space according to a fixed mapping rule. Usually, the following linear mapping rule is used:
' x'_(i )=(x_i^l+ x_i^u- x_i^l)/(2^3 -1) '_(j=0)^'''?_(j ) 2^j (2.2) '

In the above equation, the variable xi is coded with substring si of length '. The decoded value of a binary sub-string si is calculated as '_(j=0)^'''?_(j ) 2^j where 'S '_i ?? (0,1) ' And thestring s is represented as ('' S '_(??-1) S'_(??-2)'.S_2 S_1 S_0). It is worthwhile to mention here that with four bits to code each variable, there are only 24 or 16 distinct sub-strings possible because each bit-position can take a value either 0 to 1. The accuracy that can be obtained with a four bit coding is only approximately 1=16th of the search space. But as this string length is increased by 1, the obtainable accuracy increases exponentially to 1=32th of the search space. It is not necessary to code all variables in equal sub-string lengths. The length of a sub-string representing a variable depends on the desired accuracy in that variable. The length of the sub-string varies with the desired precision of the results, the longer the string length, the more the accuracy. The relationship between string length ' and precision ''is

' (x_i^u-x_i^l )10'^?? '(2^??-1) (2.3)

Once the coding of the variables is complete, the corresponding point x = (x1; x2; :::; xN)T can be found. (Eq. 2). Thereafter, the function value at the point x can also be calculated by substituting x in the given objective function f(x)

2.2.5 Fitness Function

GAs mimics the survival-of-the-fittest principle of nature to make a search process. Therefore, GAs are naturally suitable for solving maximization problems. Maximization problems are usually transformed into maximization problem by suitable transformation. In general, a fitness function F(i) is first derived from the objective
Function and used in successive genetic operations. Fitness in biological sense is a quality value which is a measure of the reproductive efficiency of chromosomes. In genetic algorithm, fitness is used to allocate reproductive traits to the individuals in the population and thus act as some measure of goodness to be maximized. This means that individuals with higher fitness value will have higher probability of being selected as candidates for further examination. Certain genetic operators require that the fitness function be non-negative, although certain operators need not have this requirement. For maximization problems, the fitness function can be considered to be the same as the objective function or F(i) = O(i). For minimization problems, to generate non-negative values in all the cases and to reflect the relative fitness of individual string, it is necessary to map the underlying natural objective function to fitness function form. A number of such transformations are possible. Two commonly adopted fitness mappings are presented below.

F (x)= 1/(1+f (x))

This transformation does not alter the location of the minimum, but converts a minimization problem to an equivalent maximization problem. An alternate function to transform the objective function to get the fitness value F(i) as below.

F (i)=V- O(i)P/('_(i=1)^P''O (i)')

Where, O(i) is the objective function value of ith individual, P is the population size and V is a large value to ensure non-negative fitness values. The value of V adopted in this work is the maximum value of the second term of equation 5 so that the fitness value corresponding to maximum value of the objective function is zero. This transformation also does not alter the location of the solution, but converts a minimization problem to an equivalent maximization problem. The fitness function value of a string is known as the string fitness.

2.2.6 GA operators

The operation of GAs begins with a population of random strings representing design or decision variables. The population is then operated by three main operators reproduction, crossover and mutation to create a new population of points. GAs can be viewed as trying to maximize the fitness function, by evaluating several solution vectors. The purpose of these operators is to create new solution vectors by selection, combination or alteration of the current solution vectors that have shown to be good temporary solutions. The new population is further evaluated and tested till termination. If the termination criterion is not met, the population is iteratively operated by the above three operators and evaluated. This procedure is continued until the termination criterion is met. One cycle of these operations and the subsequent evaluation procedure is known as a generation in GAs terminology.

2.2.7 Reproduction

Reproduction (or selection) is an operator that makes more copies of better strings in a new population. Reproduction is usually the first operator applied on a population. Reproduction selects good strings in a population and forms a mating pool. This is one of the reasons for the reproduction operation to be sometimes known as the selection operator. Thus, in reproduction operation the process of natural selection causes those individuals that encode successful structures to produce copies more frequently. To sustain the generation of a new population, the reproduction of the individuals in the current population is necessary. For better individuals, these should be from the fittest individuals of the previous population. There exist a number of reproduction operators in GA literature, but the essential idea in all of them is that the above average strings are picked from the current population and their multiple copies are inserted in the mating pool in a probabilistic manner.

Roulette-Wheel Selection: The commonly-used reproduction operator is the proportionate reproduction operator where a string is selected for the mating pool with a probability proportional to its fitness. Thus, the ith string in the population is selected with a probability proportional to Fi. Since the population size is usually kept fixed in a simple GA, the sum of the probability of each string being selected for the mating pools must be one. Therefore, the probability for selecting the ith string is

p_i =V- F_i/('_(i=1)^n'' F'_i )

Where n is the population size. One way to implement this selection scheme is to imagine a roulette-wheel with its circumference marked for each string proportionate to the string's fitness. The roulette-wheel is spun n times. Each time selecting an instance of

the string chosen by the roulette-wheel pointer. Since the circumference of the wheel is marked according to a string's fitness, this roulette-wheel mechanism is expected to make Fi'Fcopies of the ith string in the mating pool. The average fitness of the population is calculated as ??( F) = '_(i=1)^n'F_i (2.7)

Table 2.2 Fitness Value of Individual

Population Fitness
1 25.0
2 5.0
3 40.0
4 10.0
5 20.0

Fig. 2.3 A roulette-wheel marked for five individuals according to their fitness values. Third individual has a higher probability of selection than any other

The Fig. 2.3 shows a roulette-wheel for each individual having different fitness values. Since the third individual has higher fitness value than any other, it is expected that the roulette-wheel selection will choose the third individual more than any other individual. This roulette-wheel selection scheme can be simulated easily. Using the fitness value Fi of all strings, the probability os selecting s string pi can be calculated. Thereafter, the cumulative probability Pi of each string being copied can be calculated by adding the individual probabilities from the top of the list. Thus, the bottom-most string in the population should have a cumulative probability values from Pi-1 to P. The first string represents the cumulative values from zero to P1. Thus, the cumulative probability of any sting lies between 0 to 1. In order to choose n strings, n random numbers between zeros to one are created at random. Thus, String that represents the chosen random number in the

cumulative probability range (calculated from the fitness values) for the string is chosen to the mating pol. This way the string with a higher fitness value will represent a larger range in the cumulative probability values and therefore has a higher of being copied into the mating pool. On the other hand, a string with a smaller fitness value represents a smaller range in cumulative probability values and has a smaller probability of being copied into the mating pool.

Stochastic remainder selection: A better selection scheme is also presented here the basic idea of this selection is to remove or copy the strings depending on the values of reproduction counts. This is achieved by computing the reproduction count associated with each string. Reproduction count is computed based on the fitness value by stochastic remainder selection without replacement as it is superior to other schemes. Hence, this scheme is recommended. First the probability of selection ps is calculated as ps = F(i)/ ' F(i). The expected number of individuals of each string is calculated as ei = ps ?? P, where P is the population size. The fractional parts of ei are treated as probabilities with which individuals are selected for reproduction. One by one Bernoulli trials (i.e. weighted coin tosses) are performed using the fractional part of ei. For example, a string with ei = 1:5 will get a single count surely and another with a probability of 0:5. This is continued till all the candidates in the population are examined. Reproduction is done based on this computed reproduction count. Individuals with 0 counts are eliminated from the population. Other individuals with non-zero counts get multiple copies in population equal to the value of their counts. The size of the population is kept constant and this completes the reproduction operation. Different selection schemes vary in principle by assigning different number of copies to better strings in the population but in all selection schemes the essential idea is that more copies are allocated to the strings with higher fitness values.

2.2.8 Crossover

A crossover operator is used to recombine two strings to get a better string. In crossover operation, recombination process creates different individuals in the successive generations by combining material from two individuals of the previous generation. In reproduction, good strings in a population are probabilistic-ally assigned a larger number of copies and a mating pool is formed. It is important to note that no new strings are formed in the reproduction phase. In the crossover operator, new strings are created by exchanging information among strings of the mating pool.

The two strings participating in the crossover operation are known as parent strings and the resulting strings are known as children strings. It is intuitive from this construction that good sub-strings from parent strings can be combined to form a better child string, if an appropriate site is chosen. With a random site, the children strings produced may or may not have a combination of good sub-strings from parent strings, depending on whether or not the crossing site falls in the appropriate place. But this is not a matter of serious concern, because if good strings are created by crossover, there will be more copies of them in the next mating pool generated by crossover. It is clear from this discussion that the effect of cross over may be detrimental or beneficial. Thus, in order to preserve some of the good strings that are already present in the mating pool, all strings in the mating pool are not used in crossover. When a crossover probability,

defined here as pc is used, only 100pc per cent strings in the population are used in the crossover operation and 100(1 ' pc) per cent of the population remains as they are in the current population. A crossover operator is mainly responsible for the search of new strings even though mutation operator is also used for this purpose sparingly.

2.2.9 Mutation

Mutation adds new information in a random way to the genetic search process and ultimately helps to avoid getting trapped at local optima. It is an operator that introduces diversity in the population whenever the population tends to become homogeneous due to repeated use of reproduction and crossover operators. Mutation may cause the chromosomes of individuals to be different from those of their parent individuals.

Mutation in a way is the process of randomly disturbing genetic information. They operate at the bit level; when the bits are being copied from the current string to the new string, there is probability that each bit may become mutated. This probability is usually a quite small value, called as mutation probability pm. A coin toss mechanism is employed; if random number between zero and one is less than the mutation probability, then the bit is inverted, so that zero becomes one and one becomes zero. This helps in introducing a bit of diversity to the population by scattering the occasional points. This random scattering would result in better optima, or even modify a part of genetic code that will be beneficial in later operations. On the other hand, it might produce a weak individual that will never be selected for further operations.

The need for mutation is to create a point in the neighborhood of the current point, thereby achieving a local search around the current solution. The mutation is also used to maintain diversity in the population. For example, the following population having four eight bit strings may be considered:


It can be noticed that all four strings have a 0 in the left most bit position. If the true optimum solution requires 1 in that position, then neither reproduction nor crossover operator described above will be able to create 1 in that position. The inclusion of mutation introduces probability pm of turning 0 into 1.

These three operators are simple and straightforward. The reproduction operator selects good strings and the crossover operator recombines good sub-strings from good strings together, hopefully, to create a better sub-string. The mutation operator alters a string locally expecting a better string. Even though none of these claims are guaranteed and/or tested while creating a string, it is expected that if bad strings are created they will be eliminated by the reproduction operator in the next generation and if good strings are created, they will be increasingly emphasized. Further insight into these operators, different ways of implementations and some mathematical foundations of genetic algorithms can be obtained from GA literature.

Application of these operators on the current population creates a new population. This new population is used to generate subsequent populations and so on, yielding solutions that are closer to the optimum solution. The values of the objective function of the individuals of the new population are again determined by decoding the strings. These values express the fitness of the solutions of the new generations. This completes one cycle of genetic algorithm called a generation. In each generation if the solution is improved, it is stored as the best solution. This is repeated till convergence.


The way of the decisions encoding used by us consists of extraction from the chromosome the information about the sequence of the blocks allocation. In this case the final decision represents some blocks queue, according to which one the allocation algorithm packs the blocks into containers.

The algorithm in series takes the blocks from the queue and places them in to the first container. At the overflow of the container there is the transition to the following free container. Thus the queue of the blocks processing is taken from the chromosome implicitly. The decoding process (Fig 2.4) looks as follows. Before the start decoding we shall create the initial queue of the blocks with the length n where the each element points to the appropriate type of the block [1, n] and is equal to the index in queue (1,2, ', n). Let the chromosome is represented as the pairs set (i, j) each of which defines the rearrangement in initial queue the i-element on the j element. Having executed the all rearrangements submitted in the chromosome, we shall receive the resulting queue of the blocks allocation

Fig. 2.4 An Example of Decoding Of the Chromosome

The necessary for the transformation of the initial queue to any possible quantity of the rearrangements is easily determined from the following reasons. The elementary algorithm of the data ordering consists of the bringing of the any sequence of the elements to growing (or decreasing) sequence. For this purpose the elementary algorithm consistently looks through all elements, starting with the first, and makes rearrangement of the current element with the minimal of raw element (which indexes above than index of the current element). Thus, the each concrete ordering can be considered as the set of the rearrangements, whose quantity is equal to the general number of the serializable elements. In our case the chromosome decoding is the inverse procedure to the ordering: the initial growing sequence will be transformed to one of the possible sequences that can be achieved by the performance of the rearrangements received at the ordering in the return order since the last. Thus, the sufficient number of the rearrangements for the transformation of the initial queue to any of the possible queue is equal to number of the elements in the queue.


Nearly everyone can gain benefits from Genetic Algorithms, once he can encode solutions of a given problem to chromosomes in GA, and compare the relative performance (fitness) of solutions.

An effective GA representation and meaningful fitness evaluation are the keys of the success in GA applications. The appeal of GAs comes from their simplicity and elegance as robust search algorithms as well as from their power to discover good solutions rapidly for difficult high-dimensional problems. GAs are useful and efficient when

Fig. 2.5 Transition in GA

The search space is large, complex or poorly understood
Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the search space
No mathematical analysis is available
Traditional search methods fail

The advantage of the GA approach is the ease with which it can handle arbitrary kinds of constraints and objectives; all such things can be handled as weighted components of the fitness function, making it easy to adapt the GA scheduler to the particular requirements of a very wide range of possible overall objectives.

GAs have been used for problem-solving and for modeling. GAs are applied to many scientific, engineering problems, in business and entertainment, including:

Optimization: GAs have been used in a wide variety of optimization tasks, including numerical optimization, and combinatorial optimization problems such as traveling salesman problem (TSP), circuit design, job shop scheduling and video & sound quality optimization.
Automatic Programming: GAs have been used to evolve computer programs for specific tasks, and to design other computational structures, for example, cellular automata and sorting networks.
Machine and robot learning: GAs have been used for many machine- learning applications, including classification and prediction, and protein structure prediction. GAs have also been used to design neural networks, to evolve rules for learning classifier systems or symbolic production systems, and to design and control robots.
Economic models: GAs have been used to model processes of innovation, the development of bidding strategies, and the emergence of economic markets.
Immune system models: GAs have been used to model various aspects of the natural immune system, including somatic mutation during an individual's lifetime and the discovery of multi-gene families during evolutionary time.
Ecological models: GAs have been used to model ecological phenomena such as biological arms races, host-parasite co-evolutions, symbiosis and resource ow in ecologies.
Population genetics models: GAs have been used to study questions in population genetics, such as "under what conditions will a gene for recombination be evolutionarily viable?" Interactions between evolution and learning: GAs have been used to study how individual learning and species evolution affect one another.
Models of social systems: GAs have been used to study evolutionary aspects of social systems, such as the evolution of cooperation [Chughtai 1995], the evolution of communication, and trail-following behavior in ants.


Each leaf is assigned to one class representing the most appropriate target value. Alternatively, the leaf may hold a probability vector indicating the probability of the target attribute having a certain value. Instances are classified by navigating them from the root of the tree down to a leaf, according to the outcome of the tests along the path. Figure 2.6 describes a decision tree that reasons whether or not a potential customer will respond to a direct mailing. Internal nodes are represented as circles, whereas leaves are denoted as triangles. Note that this decision tree incorporates both nominal and numeric attributes. Given this classifier, the analyst can predict the response of a potential customer (by sorting it down the tree), and understand the behavioral characteristics of the entire potential customers population regarding direct mailing. Each node is labeled with the attribute it tests, and its branches are labeled with its corresponding values.

In case of numeric attributes, decision trees can be geometrically interpreted as a collection of hyperplanes, each orthogonal to one of the axes. Naturally, decision-makers prefer less complex decision trees, since they may be considered more comprehensible. Furthermore, the tree complexity has a crucial effect on its accuracy. The tree complexity is explicitly controlled by the stopping criteria used and the pruning method employed. Usually the tree complexity is measured by one of the following metrics: the total number of nodes, total number of leaves, tree depth and number of attributes used. Decision tree induction is closely related to rule induction. Each path from the root of a decision tree to one of its leaves can be transformed into a rule simply by conjoining the tests along the path to form the antecedent part, and taking the leaf's class prediction as the class value. For example, one of the paths in Figure 2.6 can be transformed into the rule: 'If customer age is is less than or equal to or equal to 30, and the gender of the customer is 'Male' ' then the customer will respond to the mail'. The resulting rule set can then be simplified to improve its comprehensibility to a human user, and possibly its accuracy.

Fig. 2.6 Decision Tree Presenting Response to Direct Mailing.


Decision tree inducers are algorithms that automatically construct a decision tree from a given dataset. Typically the goal is to find the optimal decision tree by minimizing the generalization error. However, other target functions can be also defined, for instance, minimizing the number of nodes or minimizing the average depth.

Induction of an optimal decision tree from a given data is considered to be a hard task. It has been shown that finding a minimal decision tree consistent with the training set is NP'hard. Moreover, it has been shown that constructing a minimal binary tree with respect to the expected number of tests required for classifying an unseen instance is NP'complete. Even finding the minimal equivalent decision tree for a given decision tree [44] or building the optimal decision tree from decision tables is known to be NP'hard.

There are various top'down decision trees inducers such as ID3, C4.5 CART. Some consist of two conceptual phases: growing and pruning (C4.5 and CART). Other inducers perform only the growing phase. Fig. 2.7 presents a typical algorithmic framework for top'down inducing of a decision tree using growing and pruning. Note that these algorithms are greedy by nature and construct the decision tree in a top'down, recursive manner (also known as 'divide and conquer'). In each iteration the algorithm considers the partition of the training set using the outcome of a discrete function of the input attributes. The selection of the most appropriate function is made according to some splitting measures. After the selection of an appropriate split, each node further subdivides the training set into smaller subsets, until no split gains sufficient splitting measure or a stopping criteria is satisfied.

TreeGrowing (S,A,y)
S - Training Set
A - Input Feature Set
y - Target Feature
Create a new tree T with a single root node.
IF One of the Stopping Criteria is fulfilled THEN

Mark the root node in T as a leaf with the most
common value of y in S as a label.
Find a discrete function f(A) of the input
attributes values such that splitting S
according to f(A)'s outcomes (v1,...,vn) gains
the best splitting metric.
IF best splitting metric > treshold THEN
Label t with f(A)
FOR each outcome vi of f(A):
Set Subtreei= TreeGrowing (??f(A)=viS,A,y).
Connect the root node of tT to Subtreei with
an edge that is labelled as vi
Mark the root node in T as a leaf with the most
common value of y in S as a label.
TreePruning (S, T, y)
S - Training Set
y - Target Feature
T - The tree to be pruned
Select a node t in T such that pruning it
maximally improve some evaluation criteria
IF t=?? THEN T=pruned(T,t)
UNTIL t=??

Fig 2.7: Top-Down Algorithmic Framework for Decision Trees Induction.


Let's discuss few inducers that take a part in decision tree.

2.7.1 ID3

The ID3 algorithm is considered as a very simple decision tree algorithm. ID3 uses information gain as splitting criteria. The growing stops when all instances belong to a single value of target feature or when best information gain is not greater than zero. ID3 does not apply any pruning procedures nor does it handle numeric attributes or missing values.

2.7.2 C4.5

C4.5 is an evolution of ID3, It uses gain ratio as splitting criteria. The splitting ceases when the number of instances to be split is below a certain threshold. Error'based pruning is performed after the growing phase. C4.5 can handle numeric attributes. It can induce from a training set that incorporates missing values by using corrected gain ratio criteria as presented above.

2.7.3 CART

CART stands for Classification and Regression Trees. It is characterized by the fact that it constructs binary trees, namely each internal node has exactly two outgoing edges. The splits are selected using the twoing criteria and the obtained tree is pruned by cost'complexity Pruning. When provided, CART can consider misclassification costs in the tree induction. It also enables users to provide prior probability distribution.

An important feature of CART is its ability to generate regression trees. Regression trees are trees where their leaves predict a real number and not a class. In case of regression, CART looks for splits that minimize the prediction squared error (the least'squared deviation). The prediction in each leaf is based on the weighted mean for node.

2.7.4 CHAID

Starting from the early seventies, researchers in applied statistics developed procedures for generating decision trees, such as: AID, MAID, THAID and CHAID. CHAID (Chisquare'Automatic'Interaction'Detection) was originally designed to handle nominal attributes only. For each input attribute ai, CHAID finds the pair of values in Vi that is least significantly different with respect to the target attribute. The significant difference is measured by the p value obtained from a statistical test. The statistical test used depends on the type of target attribute. If the target attribute is continuous, an F test is used. If it is nominal, then a Pearson chi'squared test is used. If it is ordinal, then a likelihood'ratio test is used.

For each selected pair, CHAID checks if the p value obtained is greater than a certain merge threshold. If the answer is positive, it merges the values and searches for an additional potential pair to be merged. The process is repeated until no significant pairs are found.

The best input attribute to be used for splitting the current node is then selected, such that each child node is made of a group of homogeneous values of the selected attribute. Note that no split is performed if the adjusted p value of the best input attribute is not less than a certain split threshold. This procedure also stops when one of the following conditions is fulfilled:

Maximum tree depth is reached.
Minimum number of cases in node for being a parent is reached, so it cannot be split any further.
Minimum number of cases in node for being a child node is reached.

CHAID handles missing values by treating them all as a single valid category. CHAID does not perform pruning.

2.7.5 QUEST

The QUEST (Quick, Unbiased, Efficient, Statistical Tree) algorithm supports univariate and linear combination splits. For each split, the association between each input attribute and the target attribute is computed using the ANOVA F'test or Levene's test (for ordinal and continuous attributes) or Pearson's chi'square (for nominal attributes). If the target attribute is multinomial, two'means clustering is used to create two super classes. The attribute that obtains the highest association with the target attribute is selected for splitting. Quadratic Discriminant Analysis (QDA) is applied to find the optimal splitting point for the input attribute. QUEST has negligible bias and it yields binary decision trees. Ten'fold cross'validation is used to prune the trees.


There are many advantages of decision tree along with some disadvantages.

2.8.1 Advantages

Several advantages of the decision tree as a classification tool have been pointed out in the literature:

Decision trees are self'explanatory and when compacted they are also easy to follow. In other words if the decision tree has a reasonable number of leaves, it can be grasped by non'professional users. Furthermore decision trees can be converted to a set of rules. Thus, this representation is considered as comprehensible.
Decision trees can handle both nominal and numeric input attributes.
Decision tree representation is rich enough to represent any discrete value classifier.
Decision trees are capable of handling datasets that may have errors.
Decision trees are capable of handling datasets that may have missing values.
Decision trees are considered to be a nonparametric method. This means that decision trees have no assumptions about the space distribution and the classifier structure.

2.8.1 Disadvantages

Decision trees have such disadvantages as:

Most of the algorithms (like ID3 and C4.5) require that the target attribute will have only discrete values.
As decision trees use the 'divide and conquer' method, they tend to perform well if a few highly relevant attributes exist, but less so if many complex interactions are present. One of the reasons for this is that other classifiers can compactly describe a classifier that would be very challenging to represent using a decision tree. A simple illustration of this phenomenon is the replication problem of decision trees. Since most decision trees divide the instance space into mutually exclusive regions to represent a concept, in some cases the tree should contain several duplications of the same sub-tree in order to represent the classifier. For instance if the concept follows the following binary function: y = (A1 ' A2)' (A3 ' A4) then the minimal univariate decision tree that represents this function is illustrated in Fig. 2.8.

Fig. 2.8 Illustration of Decision Tree with Replication.

3 The greedy characteristic of decision trees leads to another disadvantage that should be pointed out. This is its over'sensitivity to the training set, to irrelevant attributes and to noise.


In the following sub-sections, we discuss some of the most popular extensions to the classical decision tree induction paradigm.

2.9.1 Oblivious Decision Trees

Oblivious decision trees are decision trees for which all nodes at the same level test the same feature. Despite its restriction, oblivious decision trees are found to be effective for feature selection. Some researchers have proposed forward feature selection procedure by constructing oblivious decision trees. Langley and Sage suggested backward selection using the same means. It has been shown that oblivious decision trees can be converted to a decision table. Recently have suggested a new algorithm for constructing oblivious decision trees, called IFN (Information Fuzzy Network) that is based on information theory.

Fig. 2.9 Illustration of Oblivious Decision Tree.

Fig. 2.9 illustrates a typical oblivious decision tree with four input features: glucose level (G), age (A), hypertension (H) and pregnant (P) and the Boolean target feature representing whether that patient suffers from diabetes. Each layer is uniquely associated with an input feature by representing the interaction of that feature and the input features of the previous layers. The number that appears in the terminal nodes indicates the number of instances that fit this path. For example, regarding patients whose glucose level is less than 107 and their age is greater than 50, 10 of them are positively diagnosed with
diabetes while 2 of them are not diagnosed with diabetes. The principal difference between the oblivious decision tree and a regular decision tree structure is the constant ordering of input attributes at every terminal node of the oblivious decision tree, the property which is necessary for minimizing the overall subset of input attributes (resulting in dimensionality reduction). The arcs that connect the terminal nodes and the nodes of the target layer are labeled with the number of records that fit this path. An oblivious decision tree is usually built by a greedy algorithm, which tries to maximize the mutual information measure in every layer. The recursive search for explaining attributes is terminated when there is no attribute that explains the target with statistical significance.

2.9.2 Fuzzy Decision Trees

In classical decision trees, an instance can be associated with only one branch of the tree. Fuzzy decision trees (FDT) may simultaneously assign more than one branch to the same instance with gradual certainty. FDTs preserve the symbolic structure of the tree and its comprehensibility. Nevertheless, FDT can represent concepts with graduated characteristics by producing real-valued outputs with gradual shifts Janikow presented a complete framework for building a fuzzy tree including several inference procedures based on conflict resolution in rule-based systems and efficient approximate reasoning methods. SDT approach combines tree-growing and pruning, to determine the structure of the soft decision tree, with refitting and backfitting, to improve its generalization capabilities. They empirically showed that soft decision trees are significantly more accurate than standard decision trees. Moreover, a global model variance study shows a much lower variance for soft decision trees than for standard trees as a direct cause of the improved accuracy.




One such attempt is by Chenowith [4] they relied on a single technical indicator called the average direction index (ADX), which identifies and quantifies trends by averaging the fraction of today's range above or below the previous day's range. The ADX is obtained through a feature selection component and used as input into two separate neural networks (Up and Down) whose results were then polled and applied to a rule base to predict the final market movement. Without knowing the exact predictive accuracy, it is difficult to quantitatively compare the system, which inevitably includes rules for trading which may be the actual cause of the monetary gain achieved by the system rather than predictive accuracy.

Past approaches to this problem first applied an artificial neural network directly to historical stock data using linear time series modeling [8]. This produced results which over fitted the training data and therefore rendered them unusable in real trading. In addition to omitting any preprocessing, the neural networks used only contained two layers, an input and an output layer. These linear techniques are now known to be provably insufficient for any nonlinear phenomenon including stock price movement.

Variants on the MLP theme are common, such as using a variety of different markets for training data [20] and using statistical analysis to assist in training [33]. Other approaches include using fuzzy neural networks, which is based on fuzzification, a technique that allows members of a set to have degrees of membership instead of binary membership as in classical set theory. The introduction of ambiguity allows data to be divided into several different technical indicators if necessary, hopefully producing more lenient and more accurate neural networks. However, results indicate that fuzzy neural networks have prediction accuracy virtually equivalent to classical neural networks [25].

Mahfoud and Mani [24] chose to use genetic algorithms to predict stock prices. Genetic algorithms are inspired by evolutionary biology and the concept of survival of the fittest. A large population of possible algorithmic representations for stock prediction is first created. Then, each member is executed and evaluated, keeping the algorithms which generate the best results and mixing their properties amongst other high scoring algorithms to obtain a new generation of algorithms in a Darwinian fashion. The process is repeated until the error has been reduced to an acceptable level.

Zhang Lianmei and Jiang Xingjun proposed Research on New Data Mining Method Based on Hybrid Genetic Algorithm [45]. They proposed combination classification method, multiple decision trees that adopt the method of probability measurement level output are parallel combined. Then genetic algorithm is used for the optimization of connection weight matrix in combination algorithm. Furthermore, two sets of simulation experiment data are used to test and evaluate the proposed combination classification method. Results of the experiments indicate that the proposed combination classification method has higher classification accuracy level than single decision tree. Moreover, it optimizes classification rules and sustains good interpretability for classification results [45].

Multiple policy-making tree parallel combination models as Fig. 3.1. In the upper formation of the combination model, different policy-making tree methods constitute multiple decision-making trees and then combine them. Each policy-making tree root produces classified rule respectively according to the training sample. The classified result is expressed as the probability distribution which marks with the goal class. In the lower level, each decision-making tree's classified result is regarded as the result of the combination algorithm. On the processing of the combination algorithm, it results in the classified result of the combination model as well as the classified rule [45].

Fig. 3.1 Multiple policy-making tree parallel combination pattern structure drawing.
The variable significance of Fig. 3.1 is:

Sk expresses k data-in sample, S is data sample number (k=1, 2, ', S)
I expresses the ith goal class, altogether has m goal class, (i =1, 2, ', m);
DTj expresses the jth policy-making tree, altogether has n policy-making tree, (j =1, 2, ', n)
Pij expresses regarding input data sample sk, jth policymaking tree to ith goal class probability size;
Oi expresses regarding input data sample sk, a ith goal class combination weighting probability size which the combination classification obtains;
Wij expresses the jth policy-making tree to the ith goal class connection power value, it represents the jth policymaking tree ith goal class important degree, the value sector for [0, 1]

Among which: The combination algorithm input festival points are m 'n, the output festival points are m, oi may (1) obtain by the formula: I =1, 2, ', m.

Mahdi Pakdaman Naeini et al [23] offered Stock Market Value Prediction Using Neural Networks. They have suggested a predictive model based on MLP neural network for predicting stock market changes in Tehran Stock Exchange Corporation (TSEC). Using this model, one can predict the next day stock value of a company only based on its stock trade history and without any information of the current market. Their experiments show that the prediction error of this model is around 1.5%.

First, traditional methods such as linear regression and logistic regression are model based while Neural Networks are self-adjusting methods based on training data, so they have the ability to solve the problem with a little knowledge about its model and without constraining the prediction model by adding any extra assumptions. Besides, neural networks can find the relationship between the input and output of the system even if this relationship might be very complicated because they are general function approximators. Consequently, neural networks are well applied to the problems in which extracting the relationships among data is really difficult but on the other hand there exists a large enough training data sets. It should be mentioned that, although sometimes the rules or patterns that they are looking for might not be easily found or the data

could be corrupted due to the process or measurement noise of the system, it is still believed that the inductive learning or data driven methods are the best way to deal with real world prediction problems [23].

Second, Neural Networks have generalization ability meaning that after training they can recognize the new patterns even if they haven't been in training set. Since in most of the pattern recognition problems predicting future events (unseen data) is based on previous data (training set), the application of neural networks would be very beneficial.
Third, neural networks have been claimed to be general function approximators. It is proved that an MLP neural network can approximate any complex continuous function that enables us to learn any complicated relationship between the input and the output of the system [23].

Genetically optimized fuzzy decision trees (G-DTs) have also been proposed in [42]. In G-DTs, the fuzzy decision trees are constructed in the setting of genetic optimization. The genetic algorithm optimizes the parameters of the fuzzy sets that are associated with the individual nodes, which act as fuzzy 'switches' by distributing a process flow. A shortcoming of univariate decision tree learners is that they do not learn intermediate concepts, and only select one of the input features in the branching decision at each intermediate tree node. Thus, an efficient algorithm for generating generalized decision forests has been proposed [13].

The Decision tree C4.5 [15] algorithm is widely used to construct decision trees, and has found varied applications in fields including human talent prediction [9], stock market prediction [27] and stock trading rule generation[40]. It is used in the present study to select the features required for trend prediction. A decision tree is a flowchart-like tree structure, where each internal node (non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node. Given a tuple, for which the associated class label is unknown, the attribute values of the tuple are tested against the decision tree. A path is traced from the root to a leaf node, which holds the class prediction for that tuple. The construction of decision tree classifiers does not require any domain knowledge or parameter setting, and therefore is appropriate for exploratory knowledge discovery. During tree construction, attribute selection measures are used to select the attribute that best partitions the tuples into distinct classes.

SVMs have been used in [21] and [30] for predicting financial time series, and in [37] and [41] to design stock trading/decision support systems. SVM [39] is a learning system based on statistical learning theory. SVM belongs to the class of supervised learning algorithms in which the learning machine is given a set of features (or inputs) with the associated labels (or output values). Each of these features can be looked upon as a dimension of a hyper-plane. SVMs construct a hyper-plane that separates the hyper-space into two classes (this can be extended to multi-class problems).While doing so, SVM algorithm tries to achieve maximum separation between the classes. Separating the classes with a large margin minimises the expected generalisation error. 'Minimum generalisation error', means that when a new set of features (that is data points with unknown class values) arrive for classification, the chance of making an error in the prediction (of the class to which it belongs) based on the learned classifier (hyper-plane) should be minimum. Intuitively, such a classifier is one, which achieves maximum separation margin between the classes. The above process of maximizing separation leads to two hyper-planes parallel to the separating plane, on either side of it.

These two can have one or more points on them. The planes are known as 'bounding planes' and the distance between them is called 'margin'. SVM 'learning' involves finding a hyper-plane, which maximizes the margin and minimizes the misclassification error. The points lying beyond the bounding planes are called support vectors. Vapnik [39] has shown that if the training features are separated without errors by an optimal hyper-plane, the expected error rate on a test sample is bounded by the ratio of the expectation of the support vectors to the number of training vectors. The smaller the size of the support vector set, more general the above result. Further, the generalization is independent of the dimension of the problem. In case such a hyper-plane is not possible, the next best is to minimize the number of misclassifications whilst maximizing the margin with respect to the correctly classified features.




The proposed Efficient Prediction of Close Value using Genetic algorithm based horizontal partition decision tree The proposed methods is implemented using genetic Algorithm which includes the concept of decision tree. The scheme provides Stock market price index prediction is regarded as a challenging task of the finance. Genetic algorithms (GAs) are problem solving methods (or heuristics) that mimic the process of natural evolution. Unlike artificial neural networks (ANNs), designed to function like neurons in the brain, these algorithms utilize the concepts of natural selection to determine the best solution for problem.

Algorithm 1 Genetic Algorithm

for all members of population
sum += fitness of this individual
end for
for all members of population
probability = sum of probabilities + (fitness / sum)
sum of probabilities += probability
end for
loop until new population is full
do this twice
number = Random between 0 and 1
for all members of population
if number > probability but less than next probability then
you have been selected
end for
create offspring
end loop

Algorithm 2 Horizontal Partitioned Based Decision Tree

Define P1, P2' Pn Parties.(Horizontally partitioned).
Each Party contains R set of attributes A1, A2, '., AR.
C the class attributes contains c class values C1, C2' Cc.
For party Pi where i = 1 to n do
If R is Empty Then
Return a leaf node with class value
Else If all transaction in T(Pi) have the same class Then
Return a leaf node with the class value
Calculate Expected Information classify the given sample for each party Pi individually.
Calculate Entropy for each attribute (A1, A2, '., AR) of each party Pi.
Calculate Information Gain for each attribute (A1, A2,'., AR) of each party Pi
End If.
End For

Calculate Total Information Gain for each attribute of all parties (TotalInformationGain( )).
ABestAttribute ' MaxInformationGain( )
Let V1, V2, '., Vm be the value of attributes. ABestAttribute partitioned P1, P2,'., Pn parties into m parties
P1(V1), P1(V2), '., P1(Vm)
P2 (V1), P2(V2), '., P2(Vm)
. .
. .
Pn(V1), Pn(V2), '., Pn(Vm)
Return the Tree whose Root is labelled ABestAttribute and has m edges labelled V1, V2, '., Vm. Such that for every i the edge Vi goes to the Tree
NPPID3(R ' ABestAttribute, C, (P1(Vi), P2(Vi), '., Pn(Vi)))

Algorithm 3 TotalInformationGain( ) - To compute the Total Information Gain for every attribute.

For j = 1 to R do {Attribute A1, A2,'., AR }
Total_Info_Gain(Aj) = 0
For i = 1 to n do {Parties P1, P2,'., Pn }
Total_Info_Gain(Aj) = Total_Info_Gain(Aj) + Info_Gain(Aij)
End For
End For

Algorithm 4: MaxInformationGain ( ) ' To compute the highest Information Gain for horizontally partitioned data

MaxInfoGain = -1
For j = 1 to R do {Attribute A1, A2,'., AR }
Gain = TotalInformationGain(Aj)
If MaxInfoGain < Gain then
MaxInfoGain = Gain
ABestAttribute = Aj
End If
Return (ABestAttribute )
End For



As shown in the below Fig. 5.1 is the Starting Window of the Software.

Fig. 5.1 Starting Window of Software

As shown in the below Fig. 5.2 Select Dataset from Stock_market.arff for Existing Work.

Fig. 5.2 Select Dataset for Existing Work from Stock_ Market.arff

As shown in the below Fig. 5.3 Result in Time (ms) and Error Rate for Existing Work.

Fig. 5.3 Existing Work Result in Time (Ms)

As shown in the below Fig. 5.4 All the Error Rate for Existing work ..

Fig. 5.4 Existing Work Error Rate Result

As shown in the below Fig. 5.5 select Dataset1 for starting Proposed work.


Fig. 5.5 Select Dataset1 for Starting Proposed Work

As shown in the below Fig. 5.6 Select the Stock_market1.arff Dataset which is the partition 1 of Full Dataset for Proposed Work.

Fig. 5.6 Select Dataset from Stock market for Proposed work

As shown in the below Fig. 5.7 Stock_market1.arff Dataset from Stock market Dataset for Proposed Work.

Fig. 5.7 Select Dataset1 from Stock-_Market1.arff

As shown in the below Fig. 5.8 select Dataset2 from Stock market which is stock_market2.arff for Proposed work.

Fig. 5.8 Select Dataset2 for Proposed work

As shown in the below Fig. 5.9 Two Partition of Full Dataset (Stock_market.arff) value for Proposed Work.

Fig. 5.9 Two Partition of Full Dataset

As shown in the below Fig. 5.10 Result in Time (ms) of Proposed Work.

Fig. 5.10 Result in Time (ms) of Proposed Work

As shown in the below Fig. 5.11 Comparison of Result in Time (ms) between Existing and Proposed Work.

Fig. 5.11 Result Comparison between Existing and Proposed Work

As shown in the below Fig. 5.12 Select Dataset Viewer for View Decision and Dataset Value in Tabular form.

Fig. 5.12 Select Dataset viewer

As shown in the below Fig. 5.13 Select View Tree to View the Decision Tree according to Selected Dataset.

Fig. 5.13 Select Dataset Viewer for View Decision Tree

As shown in the below Fig. 5.14 View the Decision Tree according to Selected Dataset.

Fig. 5.14 Decision Tree of selected Dataset

As shown in the below Fig. 5.15 Select Choose option from Dataset Viewer for View the Dataset Value in Tabular form.

Fig. 5.15 Starting window of Dataset Table

As shown in the below Fig. 5.16 select the open File Option for select the Dataset.

Fig. 5.16 Open Dataset File

As shown in the below Fig. 5.17 Select the Dataset file to view the value in Tabular form.

Fig. 5.17 Select Dataset from Stock market

As shown in the below Fig. 5.18 Stock_market.arff Dataset Value from First Page.

Fig. 5.18 First Page of Stock market Dataset Table

As shown in the below Fig. 5.19 Stock_market.arff Dataset Value from Last Page.

Fig. 5.19 Last Page of Stock market Dataset Table

As shown in the below Fig. 5.20 is the time complexity comparison between existing id3 and C4.5 based decision tree and Horizontal partition based decision tree.

Fig. 5.20 Time Comparison Between ID3 , C4.5 and HP

Table 5.1 Computational Time in Id3 , C4.5 and HP (ms)

C4.5 Time(Ms)

20 80 52 17
25 97 78 18
50 115 94 19
100 135 121 33
200 160 138 37

As shown in the below Fig 5.21 is the time complexity comparison between existing id3 based decision tree C4.5 decision tree and Horizontal partition based decision tree.

Fig 5.21 Time Comparison between ID3, C4.5 and HP

As shown in the below Table 5.2 is the Time comparisons of the proposed rate which is less as compared to the existing id3 and C4.5 decision tree.

Table 5.2 Computational Time in Id3, C4.5 and HP (ms) e in Id3, C4.5 and HP (ms)

Number_Of_Instances Id3_Time(Ms) C4.5 Time(Ms) Hp_Time(Ms)
10500 1076 429 270
21000 2044 692 590
42000 4773 1046 970
84000 12621 2854 2660
100000 15834 3428 3200

Table 5.3 Time comparison of ID3, C4.5 and HP

Number_of_data_values Size_of_data_set Time_Existing ID3 Time C4.5 Time_Horizontal_partioned_ID3
10500 347KB 1076ms 429ms 270ms
21000 693KB 2044ms 692ms 590ms
42000 1.35MB 4773ms 1046ms 970ms
84000 2.70MB 12621ms 2854ms 2660ms
100000 3.21MB 15834ms 3428ms 3200ms

As shown in the below is the Fig. 5.22 comparison Absolute Error Rate between existing id3 and C4.5 based decision tree and Horizontal partition based decision tree.

Fig. 5.22 Compare Absolute Error Rate between ID3, C4.5 and HP

As shown in the below Table 5.4 is the mean absolute error rate of the proposed rate which is less as compared to the existing id3 and C4.5 decision tree.

Table 5.4 Comparison of Mean Absolute Error of ID3, C4.5 and HP

Id3_Mean Absolute Error C4.5 Mean Absolute Error Hp_Mean Absolute Error
20 0.286 0.245 0.1167
25 0.28 0.279 0.276
50 0.31 0.3 0.29
100 0.35 0.32 0.298
200 0.38 0.35 0.31

As shown in below Fig.5.23 comparison Absolute Error Rate between existing id3 and C4.5 based decision tree and Horizontal partition based decision tree and was found

Fig. 5.23 Compare Absolute Error Rate between ID3, C4.5 and HP

As shown in the below Table 5.5 is the mean absolute error rate of the proposed rate which is less as compared to the existing id3 and C4.5 decision tree.

Table 5.5 Comparison of Mean Absolute Error of ID3, C4.5 and HP

Number_Of_Instances Id3_Time(Ms) C4.5 Time(Ms) Hp_Time(Ms)
10500 0.3052 0.2991 0.2987
21000 0.3051 0.2997 0.2986
42000 0.3051 0.2995 0.2986
84000 0.3051 0.3003 0.2988
100000 0.3055 0.3005 0.2998

As shown in the below Table 5.6 is the mean absolute error rate of the proposed rate which is less as compared to the existing id3 and C4.5 decision tree.

Table 5.6 Comparison of Mean Absolute Error of ID3, C4.5 and HP

Number_of_data_values Size_of_data_set Mean absolute error _id3 Mean absolute error C4.5 Mean absolute error _HP
10500 347KB 0.3052 0.2991 0.2987
21000 693KB 0.3051 0.2997 0.2986
42000 1.35MB 0.3051 0.2995 0.2986
84000 2.70MB 0.3051 0.3003 0.2988
100000 3.21MB 0.3055 0.3005 0.2998

As shown in the below Fig.5.24 comparison Root Mean Square Error between existing id3 and C4.5 based decision tree and Horizontal partition based decision tree.

Fig. 5.24 Compare Root Mean Square of ID3, C4.5 and HP

As shown in the Table 5.7 is Root Mean Square Error of the proposed rate which is less as compared to the existing id3 and C4.5 decision tree.

Table 5.7 Comparison of Root Mean Square Error of ID3, C4.5 and HP

Number_Of_Instances Id3_Time(Ms) C4.5 Time(Ms) Hp_Time(Ms)
10500 0.5525 0.5493 0.5482
21000 0.5524 0.5497 0.5485
42000 0.5524 0.5502 0.5489
84000 0.5524 0.5502 0.5497
100000 0.5527 0.5515 0.5503

As shown in the below Table 5.8 is Root Mean Square Error of the proposed rate which is less as compared to the existing id3 and C4.5 decision tree.

Table 5.8 Comparison of Root Mean Square Error of ID3, C4.5 and HP

Number_of_data_values Size_of_data_set Root mean squared error _id3 Root mean squared error C4.5 Root mean squared error _HP
10500 347KB 0.5525 0.5493 0.5482
21000 693KB 0.5524 0.5497 0.5485
42000 1.35MB 0.5524 0.5502 0.5489
84000 2.70MB 0.5524 0.5502 0.5497
100000 3.21MB 0.5527 0.5515 0.5503




The genetic algorithm can be used as an application to predict the close values in the stock market data and the performance factor of this algorithm is much better than ANN.

Here proposed algorithm is based on the integration of genetic algorithm and the horizontal partition based decision tree and compare the performance factor of the proposed algorithm as compared to the existing decision tree and the proposed algorithm performs better for the stock market close value.


The stock market prediction is based on horizontal partition decision tree but can be classified from number of instances, hence in the future grid based partition is used for the classification.

Also in the future work we can work on missing attributes means if any attribute value is missing in the dataset then the future prediction is impossible.


Can't work for missing attributes.
If the error rate is more the accuracy is less.
The classification ration is less.

[1] Agrawal R, Imilienski T, Swami A., 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD international conference on management of data, pp. 207-216.

[2] Basaltoa N, Bellottib R, De Carlob F, Facchib P, Pascazio S 2005. Clustering stock market companies via chaotic map synchronization, [online] available at:

[3] Binoy B. Nair , V.P Mohandas and N.R Sakthivel ,2010 . A Genetic Algorithm Optimized Decision Tree ' SVM based Stock Market Trend Prediction System, 2010 International Journal on Computer Science and Engineering, Vol. 02, No. 09,2010,2981-2988

[4] Chenoweth, Tim., et al., 1996. Embedding Technical Analysis Into Neural Network Based Trading Systems , Applied Artificial Intelligence, Vol. 10, pp. 523-541.

[5] Chi-Lin L, Ta-Cheng C, 2009. A study of applying data mining approach to the information disclosure for Taiwan's stock market Investors, Expert Systems with Applications: an international journal, Vol. 36, issue 2, pp. 3536-3542.

[6] David E, Suraphan T, 2005. The use of data mining and neural networks for forecasting stock market returns , Expert Systems with Applications: an international journal, Vol. 39, issue 4, pp. 927-940.

[7] Defu Z, Qingshan J, Xin L., 2004. Application of Neural Networks in Financial Data Mining, Proceedings of world academy of science, Eng. Technol.

[8] Fischer Black and Myron Scholes 1973. The Pricing of Options and Corporate Liabilities , The Journal of Political Economy, Vol. 81, No. 3, pp. 637-654.

[9] H. Jantan , A.R. Hamdan and Z. A. Othman , 2010. Human talent prediction in HRM using C4.5 classification algorithm , International Journal on Computer Science and Engineering, Vol. 2, no. 8, pp.2526-2534.

[10] Hornik K. Stinchcombe M. and White H., 1989. Multilayer feed forward networks are universal approximators , Neural Networks, Vol. 2, issue 5, pp. 359 -366.

[11] Huarng K, Yu HK, 2005. A type 2 fuzzy time series model for stock index forecasting , Physica A: Statistical Mechanics and its Applications, Vol. 353, pp. 445-462.

[12] H. White, 1998. Economic prediction using neural networks: The case of IBM daily stock returns , Proceedings of the IEEE International Conference on Neural Networks, Vol. 2, pp. 451-458.

[13] H. Zhao, and A. P. Sinha, 2005. An efficient algorithm for generating generalized decision forests , IEEE Trans. Systems, Man, Cybertics, A, Systems and Humans, Vol. 35, no. 5, pp. 754-762.

[14] Jar-Long W, Shu-Hui C, 2006. Stock market trading rule discovery using two-layer bias decision tree , Expert Systems with Applications: an international journal, Vol. 30, issue 4, pp. 605 ' 611.
[15] J. Han, and M. Kamber, 2006. Data Mining: Concepts and Techniques ,2nd Ed., San Mateo, CA:Morgan Kaufmann.

[16] Jie S, Hui L., 2008. Data mining method for listed companies financial distress prediction, Knowledge-Based Systems, Vol. 21, issue 1, pp. 1-5.

[17] Jilani TA, Burney SMA, Ardil C., 2007. Multivariate high order fuzzy time series forecasting for car road accidents , International Journal of Computational Intelligence, Vol.4, issue 1.

[18] Jilani TA, Burney SMA, 2007. M-factor high order fuzzy time series forecasting for road accident data, Analysis and Design of Intelligent Systems Using Soft Computing Techniques, Vol. 41, pp 246-254.

[19] Jilani TA, Burney SMA, 2008. Multivariate stochastic fuzzy forecasting models , Expert Systems with Applications: An International Journal, Vol. 35 Issue 3, pp. 691-700.

[20] J. Roman, and A. Jameel, 1996. Back propagation and Recurrent Neural Networks in Financial Analysis of Multiple Stock Market Returns , Proceedings of the 29th Annual Hawaii International Conference on System Sciences, Vol. 2, pp. 454 ' 460.

[21] K-J. Kim, 2003. Financial time series forecasting using support vector machines , Neurocomputing, Vol. 55, pp. 307 ' 319.

[22] Lee LW, Wang LW, Chen SM, Leu YH, 2006. Handling forecasting problems based on two-factor high-order time series , IEEE Transactions on Fuzzy Systems, Vol. 14, issue 3, pp.468-477.

[23] Mahdi Pakdaman Naeini, Hamidreza Taremian and Homa Baradaran Hashemi, 2010. Stock Market Value Prediction Using Neural Networks , IEEE International Conference on Computer Information Systems and Industrial Management Applications (CISIM-2000), pp. 132 -136.

[24] Mahfoud, Sam and Mani, Ganesh 1996. Financial Forecasting Using Genetic Algorithms Applied Artificial Intelligence, Vol. 10, pp. 543- 565.

[25] Martin, Rast, 1998. Improving Fuzzy Neural Networks using Input Parameter Training , IEEE North American Conference of the Fuzzy Information Processing Society - NAFIPS, pp. 55 -58.

[26] Michael F, Robert CJ 2008. A common factor analysis for the US and the German stock markets during overlapping trading hours , Journal of International Financial Markets, Institutions and Money, Vol. 18, Issue 5, pp. 498'512.

[27] Muh-Cherng W, Sheng-Yu L, Chia-Hsin L An effective application of decision tree to stock trading, Expert Systems with Applications , 2006.

[28] Murphy, John J. 1999. Technical Analysis of the Financial Markets , New York: New York Institute of Finance.

[29] Nakhaeizadeh G, Steurer E, Bartmae K. Hand book of data mining and knowledge discovery , Oxford Univ. Press, Oxford, 2002.

[30] S-H. Hsu, JJ P.-A. Hsieh , T-C. Chih, and K.-C Hsu, 2009. A two-stage architecture for stock price forecasting by integrating self-organizing map and support vector regression , Expert Systems with Applications: an international journal, Vol.36, pp.7947'7951,

[31] Shu-Hsien L, Hsu-hui H, Hui-wen L., 2008. Mining stock category association and cluster on Taiwan stock market , Expert Systems with Applications: An International Journal, Vol. 35 Issue 1-2, pp. 19-29.

[32] S??onia RB, Rui M, Diana AM, 2008. Long memory and volatility clustering: Is the empirical evidence consistent across stock markets , [online] available at:

[33] S. Wu, and R. P. Lu, 1993. Combining artificial neural networks and statistics for stock market forecasting , Proceedings of the 1993 ACM conference on Computer science, pp. 257-264.

[34] Sydney CL, Serena N., 2007. The empirical risk'return relation: A factor analysis approach , Journal of Financial Economics, Vol. 83, pp. 171-222.

[35] Tae HR 2007. Forecasting the volatility of stock price index, Expert Systems with Applications: an international journal, Vol. 39, issue 4, pp. 916-922.

[36] Tahseen AJ, Syed MAB 2008. A refined fuzzy time series model for stock market forecasting , Physica A: Statistical Mechanics and its Applications, Vol. 387, Issue 12, pp. 2857'2862.

[37] T.V.Gestel, J.A.K. Suykens, D-E. Baestaens, A. Lambrechts, G. Lanckriet, B. Vandaele, B. De Moor and J. Vandewalle, 2001. Financial Time Series Prediction Using Least Squares Support Vector Machines within the Evidence Framework , IEEE Transactions on neural networks, Vol. 12, no. 4, pp. 809- 821.

[38] Vladimir B, Sergiy B, Panos MP, 2006. Mining market data: A network Approach, Computer Operation Research, Vol. 33, issue 11, pp. 3171 ' 3184.

[39] V.N.Vapnik, M. Jordan, S.L. Lauritzen, J.F. Lawless, 2000. Nature of Statistical Learning Theory , 2nd edition, Berlin: Springer.

[40] Wataru O 2006. An analysis of intraday patterns in price clustering on the Tokyo Stock Exchange, Journal of Banking & Finance, Vol. 30, Issue 3, Pages 1023'1039.

[41] W.Huang, Y. Nakamori and S-Y.Wang, 2005. Forecasting stock market movement direction with support vector machine , Computers & Operations Research, Vol.32, no. 10, pp. 2513-2522.

[42] W. Pedrycz, and Z. A. Sosnowski, 2005. Genetically optimized fuzzy decision trees, IEEE Trans. Systems, Man, Cybertics, B, Cybernetics , Vol. 35, no. 3, pp. 633-641.

[43] Yu HK, 2005. A refined fuzzy time-series model for forecasting , Physica A: Statistical Mechanics and its Applications, Vol. 346, Issues 3'4, pp. 657'681.

[44] Zantema, H., and Bodlaender H. L., 2000. Finding Small Equivalent Decision Trees is Hard , International Journal of Foundations of Computer Science, Vol. 11, issue 2, pp. 343-354.

[45] Zhang Lianmei and Jiang Xingjun, 2010. Research on New Data Mining Method Based on Hybrid Genetic Algorithm, 2010 International Symposium on Computer, Communication, Control and Automation, Vol. 1, pp. 462 -465.

Source: Essay UK -

Not what you're looking for?


About this resource

This Finance essay was submitted to us by a student in order to help you with your studies.


No ratings yet!

  • Order a custom essay
  • Download this page
  • Print this page
  • Search again

Word count:

This page has approximately words.



If you use part of this page in your own work, you need to provide a citation, as follows:

Essay UK, Stock Market Movement. Available from: <> [20-10-17].

More information:

If you are the original author of this content and no longer wish to have it published on our website then please click on the link below to request removal:

Essay and dissertation help