A time series is a stretch of values on the same scale indexed by a timelike parameter. The basic data and parameters are functions. Time series take on a dazzling variety of shapes and forms, indeed there are as many time series as there are functions of real numbers.
Time Series Forecasting is an important area of the knowledge and there are many applications in the real world. Accurate forecasting is an essential element for many management decisions. There are several methods and techniques to find a good model that can be used to produce accurate forecasting the traditional techniques have your foundations in statistics.
Real problems time series data examples classified into four main categories (types):
Series 
Type 
Domain 
Description 
passengers 
Seasonal & Trended 
Tourism 
Monthly international airline passengers 
paper 
Sales 
Monthly sales of French paper 

deaths 
Seasonal 
Traffic 
Montly deaths & injuries ij UK roads 
maxtemp 
Meteorology 
Maximum temperature in Melbourne 

chemical 
Trended 
Chemical 
Chemical concentration readings 
prices 
Economy 
Daily IBM common stock closing prices 

lynx 
Nonlinear 
Ecology 
Annual number of lynx 
kobe 
Geology 
Seismograph of the Kobe earthquake 
Univariate time series
This term refers to a timeseries that consists of single (scalar) observations recorded sequentially through time, e.g. the monthly unemployment rate, are monthly CO2 concentrations and southern oscillations to predict el nino effects.
Multivariate time series
Multivariate time series analysis is used when one wants to model and explain the interactions and comovements among a group of time series variables:
 Consumption and income
 Stock prices and dividends
 Forward and spot exchange rates
 Interest rates, money growth, income, inflation
Stock and Watson state that macroeconometricians do four things with multivariate time series:
 Describe and summarize macroeconomic data
 Make macroeconomic forecasts
 Quantify what we do or do not know about the true structure of the macroeconomy
 Advise macroeconomic policymakers
The choice of these series is typically guided by both empirical experience and by economic theory, for example, the theory of the term structure of interest rates suggests that the spread between long and short term interest rates might be a useful predictor of future inflation.
Much research has gone into the development of ways of analysing multivariate time series (MTS) data in both the statistical and articial intelligence communities. Statistical MTS modelling methods include the Vector AutoRegressive process, the Vector AutoRegressive Moving Average process, and other nonlinear and Bayesian approaches , while various AI methods have been developed for different purposes. These include dependence detection in MTS of categorical data , knowledgebased temporal abstraction , Bayesian clustering of similar MTSs , and forecasting.
Is multivariate better than univariate?
Multivariate methods are very important in economics and much less so in other applications of forecasting. In standard textbooks on timeseries analysis, multivariate extensions are given a marginal position only. Empirical examples outside economics are rare. Exceptions are data sets with a predatorprey background, such as the notorious data on the population of the Canadian lynx and the snowshoe hare. In contrast, the multivariate view is central in economics, where single variables are traditionally viewed in the context of relationships to other variables. Contrary to other disciplines, economists may even reject the idea of univariate timeseries modeling on grounds of the theoretical interdependence, which appears to be an exaggerated position. In forecasting, and even in economics, multivariate models are not necessarily better than univariate ones. While multivariate models are convenient in modeling interesting interdependencies and achieve a better (not worse) fit within a given sample, it is often found that univariate methods outperform multivariate methods out of sample. Among others, one may name as possible reasons:
 Multivariate models have more parameters than univariate ones. Every additional parameter is an unknown quantity and has to be estimated. This estimation brings in an additional source of error due to sampling variation.
 The number of potential candidates for multivariate models exceeds its univariate counterpart. Model selection is therefore more complex and lengthier and more susceptible to errors, which then affect prediction.
 It is difficult to generalize nonlinear procedures to the multivariate case. Generally, multivariate models must have a simpler structure than univariate ones, to overcome the additional complexity that is imposed by being multivariate. For example, while a researcher may use a nonlinear model for univariate data, she may refrain from using the multivariate counterpart or such a generalization may not have been developed. Then, multivariate models will miss the nonlinearities that are handled properly by the univariate models.
 Outliers can have a more serious effect on multivariate than one univariate forecasts. Moreover, it is easier to spot and control outliers in the univariate context.
Models
1. Exponential Smoothing
Exponential smoothing is simply an adjustment technique which takes the previous period’s forecast, and adjusts it up or down based on what actually occured in that period. Whereas in Single Moving Averages the past observations are weighted equally, Exponential Smoothing assigns exponentially decreasing weights as the observation get older.
In other words, recent observations are given relatively more weight in forecasting than the older observations. The smoothing factor is used to automatically calculate how much weight to apply to each previous period.
There is also another exponential smoothing model called adaptive response rate single exponential smoothing which tracks the forecast performance and automatically adjusts to allow for shifting patterns.
Triple Exponential Smoothing is based on three equations (overall smoothing, trend smoothing and seasonal smoothing) called the “HoltWinters” (HW) method after the names of the inventors.
Table1: The fifteen exponential smoothing methods
Trend Component 
Seasonal Component 

N (None) 
A (Additive) 
M (Multiplicative) 

N (None) 
N,N 
N,A 
N,M 
A (Additive) 
A,N 
A,A 
A,M 
Ad (Additive damped) 
Ad,N 
Ad,A 
Ad,M 
M (Multiplicative) 
M,N 
M,A 
M,M 
Md (Multiplicative damped) 
Md,N 
Md,A 
Md,M 
Some of these methods are better known under other names. For example, cell (N,N) describes the simple exponential smoothing (or SES) method, cell (A,N) describes Holt’s linear method, and cell (Ad,N) describes the damped trend method. The additive HoltWinters’ method is given by cell (A,A) and the multiplicative HoltWinters’ method is given by cell (A,M). The other cells correspond to less commonly used but analogous methods.
An exponential smoothing method is an algorithm for producing point forecasts only. The underlying stochastic state space model gives the same point forecasts, but also provides a framework for computing prediction intervals and other properties.
For each exponential smoothing method in Table 1, Hyndman et al. (2002) describe two possible innovations state space models, one corresponding to a model with additive errors and the other to a model with multiplicative errors. If the same parameter values are used, these two models give equivalent point forecasts, although different prediction intervals. Thus there are 30 potential models described in this classification.
Usually a threecharacter string identifying method using the framework terminology of Hyndman et al. (2002) and Hyndman et al. (2008). The first letter denotes the error type (“A”, “M” or “Z”); the second letter denotes the trend type (“N”,”A”,”M” or “Z”); and the third letter denotes the season type (“N”,”A”,”M” or “Z”). In all cases, “N”=none, “A”=additive, “M”=multiplicative and “Z”=automatically selected. So, for example, “ANN” is simple exponential smoothing with additive errors, “MAM” is multiplicative HoltWinters’ method with multiplicative errors, and so on.
2. ARIMA
Early attempts to study time series, particularly in the 19th century, were generally characterized by the idea of a deterministic world. It was the major contribution of Yule (1927) which launched the notion of stochasticity in time series by postulating that every time series can be regarded as the realization of a stochastic process. Based on this simple idea, a number of time series methods have been developed since then. Workers such as Slutsky, Walker, Yaglom, and Yule first formulated the concept of autoregressive (AR) and moving average (MA) models. Wold’s decomposition theorem led to the formulation and solution of the linear forecasting problem of Kolmogorov (1941). Since then, a considerable body of literature has appeared in the area of time series, dealing with parameter estimation, identification, model checking, and forecasting; see, e.g., Newbold (1983) for an early survey.
The publication Time Series Analysis: Forecasting and Control by Box and Jenkins (1970) integrated the existing knowledge. Moreover, these authors developed a coherent, versatile threestage iterative cycle for time series identification, estimation, and verification (rightly known as the Box–Jenkins approach). The book has had an enormous impact on the theory and practice of modern time series analysis and forecasting. With the advent of the computer, it popularized the use of autoregressive integrated moving average (ARIMA) models and their extensions in many areas of science.
The acronym ARIMA stands for “AutoRegressive Integrated Moving Average.” Lags of the differenced series appearing in the forecasting equation are called “autoregressive” terms, lags of the forecast errors are called “moving average” terms, and a time series which needs to be differenced to be made stationary is said to be an “integrated” version of a stationary series. Exponential smoothing models (i.e., exponential weighted moving averages) are all special cases of ARIMA models.
The input series for ARIMA needs to be stationary, that is, it should have a constant mean, variance, and autocorrelation through time. Therefore, usually the series first needs to be differenced until it is stationary (this also often requires log transforming the data to stabilize the variance). In order to determine the necessary level of differencing, you should examine the plot of the data and autocorrelogram. Significant changes in level (strong upward or downward changes) usually require first order non seasonal (lag=1) differencing; strong changes of slope usually require second order non seasonal differencing. Seasonal patterns require respective seasonal differencing (see below). If the estimated autocorrelation coefficients decline slowly at longer lags, first order differencing is usually needed. However, you should keep in mind that some time series may require little or no differencing, and that over differenced series produce less stable coefficient estimates.
Indeed, forecasting discrete time series processes through univariate ARIMA models, transfer function (dynamic regression) models, and multivariate (vector) ARIMA models has generated quite a few studies. Often these studies were of an empirical nature, using one or more benchmark methods/models as a comparison. Without pretending to be complete, Table 2 gives a list of these studies.
Dataset 
Forecast horizon 
Benchmark 
Reference 
Univariate ARIMA 

Electricity load (min) 
1–30 min 
Wiener filter 
Di Caprio, Genesio, Pozzi, and Vicino (1983) 
Quarterly automobile insurance paid claim costs 
8 quarters 
Loglinear regression 
Cummins and Griepentrog (1985) 
Daily federal funds rate 
1 day 
Random walk 
Hein and Spudeck (1988) 
Quarterly macroeconomic data 
1–8 quarters 
Wharton model 
Dhrymes and Peristiani (1988) 
Monthly department store sales 
1 month 
Simple exponential smoothing 
Geurts and Kelly (1986, 1990), Pack (1990) 
Monthly demand for telephone services 
3 years 
Univariate state space 
Grambsch and Stahel (1990) 
Yearly population totals 
20–30 years 
Demographic models 
Pflaumer (1992) 
Monthly tourism demand 
1–24 months 
Univariate state space, multivariate state space 
du Preez and Witt (2003) 
Dynamic regression/transfer function 

Monthly telecommunications traffic 
1 month 
Univariate ARIMA 
Layton, Defris, and Zehnwirth (1986) 
Weekly sales data 
2 years 
n.a. 
Leone (1987) 
Daily call volumes 
1 week 
Holt–Winters 
Bianchi, Jarrett, and Hanumara (1998) 
Monthly employment levels 
1–12 months 
Univariate ARIMA 
Weller (1989) 
Monthly and quarterly consumption of natural gas 
1 month/1 quarter 
Univariate ARIMA 
Liu and Lin (1991) 
Monthly electricity consumption 
1–3 years 
Univariate ARIMA 
Harris and Liu (1993) 
VARIMA 

Yearly municipal budget data 
Yearly 
Univariate ARIMA 
Downs and Rocke (1983) 
Monthly accounting data 
1 month 
Regression, univariate, ARIMA, transfer function 
Hillmer, Larcker, and Schroeder (1983) 
Quarterly macroeconomic data 
1–10 quarters 
Judgmental methods, univariate ARIMA 
Öller (1985) 
Monthly truck sales 
1–13 months 
Univariate ARIMA, 
Heuts and Bronckers (1988) 
Monthly hospital patient movements 
2 years 
Univariate ARIMA, 
Lin (1989) 
Quarterly unemployment rate 
1–8 quarters 
Transfer function 
Edlund and Karlsson (1993) 
Comparisons with exponential smoothing
The exponential smoothing state space models are all nonstationary. Models with seasonality or nondamped trend (or both) have two unit roots; all other models—that is, nonseasonal models with either no trend or damped trend—have one unit root. It is possible to define a stationary model with similar characteristics to exponential smoothing, but this is not normally done. The philosophy of exponential smoothing is that the world is nonstationary. So if a stationary model is required, ARIMA models are better. One advantage of the exponential smoothing models is that they can be nonlinear. So time series that exhibit nonlinear characteristics including heteroscedasticity may be better modelled using exponential smoothing state space models.
For seasonal data, there are many more ARIMA models than the 30 possible models in the exponential smoothing. The larger model space of ARIMA models actually harms forecasting performance because it introduces additional uncertainty. The smaller exponential smoothing class is sufficiently rich to capture the dynamics of almost all real business and economic time series.
There is a widespread myth that ARIMA models are more general than exponential smoothing. This is not true. The two classes of models overlap. The linear exponential smoothing models are all special cases of ARIMA models. However, the nonlinear exponential smoothing models have no equivalent ARIMA counterpart. On the other hand, there are many ARIMA models which have no exponential smoothing counterpart. Thus, the two model classes overlap and are complimentary; each has its strengths and weaknesses.
The algorithms are applicable to both seasonal and nonseasonal data.
State space and structural models and the Kalman filter
At the start of the 1980s, state space models were only beginning to be used by statisticians for forecasting time series, although the ideas had been present in the engineering literature since Kalman’s (1960) groundbreaking work. State space models provide a unifying framework in which any linear time series model can be written. The key forecasting contribution of Kalman (1960) was to give a recursive algorithm (known as the Kalman filter) for computing forecasts. Statisticians became interested in state space models when Schweppe (1965) showed that the Kalman filter provides an efficient algorithm for computing the onestepahead prediction errors and associated variances needed to produce the likelihood function. Shumway and Stoffer (1982) combined the EM algorithm with the Kalman filter to give a general approach to forecasting time series using state space models, including allowing for missing observations. A particular class of state space models, known as “dynamic linear models” (DLM), was introduced by Harrison and Stevens (1976), who also proposed a Bayesian approach to estimation. Fildes (1983) compared the forecasts obtained using Harrison and Stevens method with those from simpler methods such as exponential smoothing, and concluded that the additional complexity did not lead to improved forecasting performance. The modelling and estimation approach of Harrison and Stevens was further developed by West, Harrison, and Migon (1985) and West and Harrison (1989). Harvey (1984, 1989) extended the class of models and followed a nonBayesian approach to estimation. He also renamed the models “structural models”, although in later papers he uses the term “unobserved component models”. Harvey (2006) provides a comprehensive review and introduction to this class of models including continuoustime and nonGaussian variations.
These models bear many similarities with exponential smoothing methods, but have multiple sources of random error. In particular, the “basic structural model” (BSM) is similar to Holt–Winters’ method for seasonal data and includes level, trend and seasonal components. Ray (1989) discussed convergence rates for the linear growth structural model and showed that the initial states (usually chosen subjectively) have a nonnegligible impact on forecasts. Harvey and Snyder (1990) proposed some continuoustime structural models for use in forecasting lead time demand for inventory control. Proietti (2000) discussed several variations on the BSM, compared their properties and evaluated the resulting forecasts. NonGaussian structural models have been the subject of a large number of papers, beginning with the power steady model of Smith (1979) with further development by West et al. (1985).
Another class of state space models, known as “balanced state space models”, has been used primarily for forecasting macroeconomic time series. Mittnik (1990) provided a survey of this class of models, and Vinod and Basu (1995) obtained forecasts of consumption, income, and interest rates using balanced state space models. These models have only one source of random error and subsume various other time series models including ARMAX models, ARMA models, and rational distributed lag models. A related class of state space models are the “single source of error” models that underly exponential smoothing methods.
Nonlinear models
Time series prediction is a very challenging signal processing problem as in real situations it is typically a function of a large number of factors most of which are unknown or inaccessible at the time of prediction. Although such time series appear as very noisy, nonstationary and nonlinear signals, its history carries a significant evidence that can be used to build the predictive model.
Although linearity is a useful assumption and a powerful tool in many areas, it became increasingly clear in the late 1970s and early 1980s that linear models are insufficient in many real applications. For example, sustained animal population size cycles (the wellknown Canadian lynx data), sustained solar cycles (annual sunspot numbers), energy flow, and amplitude–frequency relations were found not to be suitable for linear models.
Functionalcoefficient model
A functional coefficient AR (FCAR or FAR) model is an AR model in which the AR coefficients are allowed to vary as a measurable smooth function of another variable, such as a lagged value of the time series itself or an exogenous variable. The FCAR model includes TAR and STAR models as special cases, and is analogous to the generalized additive model of Hastie and Tibshirani (1991). Chen and Tsay (1993) proposed a modeling procedure using ideas from both parametric and nonparametric statistics. The approach assumes little prior information on model structure without suffering from the bcurse of dimensionalityQ; see also Cai, Fan, and Yao (2000).
Harvill and Ray (2005) presented multistepahead forecasting results using univariate and multivariate functional coefficient (V)FCAR models. These authors restricted their comparison to three forecasting methods: the naive plugin predictor, the bootstrap predictor, and the multistage predictor. Both simulation and empirical results indicate that the bootstrap method appears to give slightly more accurate forecast results. A potentially useful area of future research is whether the forecasting power of VFCAR models can be enhanced by using exogenous variables.
Neural nets
An artificial neural network (ANN) can be useful for nonlinear processes that have an unknown functional relationship and as a result are difficult to fit. The main idea with ANNs is that inputs, or dependent variables, get filtered through one or more hidden layers each of which consist of hidden units, or nodes, before they reach the output variable. The intermediate output is related to the final output. One major application area of ANNs is forecasting; see Zhang, Patuwo, and Hu (1998) and Hippert, Pedreira, and Souza (2001) for good surveys of the literature. Numerous studies have documented the successes of ANNs in forecasting financial data. However, in two editorials in this Journal, Chatfield (1993, 1995) questioned whether ANNs had been oversold as a miracle forecasting technique. This was followed by several papers documenting that naïve models such as the random walk can outperform ANNs.
A general problem with nonlinear models is the “curse of model complexity and model overparametrization”. If parsimony is considered to be really important, then it is interesting to compare the outof sample forecasting performance of linear versus nonlinear models, using a wide variety of different model selection criteria. This issue was considered in quite some depth by Swanson and White (1997). Their results suggested that a single hidden layer “feedforward” ANN model, which has been by far the most popular in time series econometrics, offers a useful and flexible alternative to fixed specification linear models, particularly at forecast horizons greater than onestepahead. However, in contrast to Swanson and White, Heravi, Osborn, and Birchenhall (2004) found that linear models produce more accurate forecasts of monthly seasonally unadjusted European industrial production series than ANN models. Times change, and it is fair to say that the risk of overparametrization and overfitting is now recognized by many authors; see, e.g., Hippert, Bunn, and Souza (2005) who use a large ANN (50 inputs, 15 hidden neurons, 24 outputs) to forecast daily electricity load profiles. Nevertheless, the question of whether or not an ANN is overparametrized still remains unanswered.
Evolutionary Algorithm
An evolutionary algorithm (EA) uses a collection of heuristic rules to modify a population of trial solutions in such a way that each generation of trial values tends to be, on average, better than its predecessor. The measure for whether one trial solution is better than another is the trial solution’s fitness value. In statistical applications, the fitness is a function of the summary statistic being optimized (e.g., the loglikelihood).
Genetic algorithm is the most popular type of EA inspired by the basic principles of biological evolution and natural selection. It is stochastic search algorithm which simulate the evolution of living organisms, where the fittest individuals dominate over the weaker ones, by mimicking the biological mechanisms of evolution, such as selection, crossover and mutation.
Stephen D. Sloan, Raymond W. saw, James J.Sluss have described genetic algorithm to forecast the long term quarterly sales of product in telecommunication technology sector. This has been used widely available economic indicators such as disposable personal income and new housing starts as independent variables. Authors have used individual chromosomes to indicate inclusion and disinclusion of specific economic variable as well as operational rules for combining the variables. In their proposed method, several features beyond those of canonical GA were also incorporated, including evolution of individual in distinct ecosystem with a specific level of intermarriage between ecosystems, the capability for a single gene in an individual’s chromosome to indicate a subroutine call to the complete chromosome of an individual from a previous generation, and hillclimbing applied to improve the most fit offspring produced by generation.
A Genetic Algorithm for Conformational Analysis of DNA by C. B. Lucasius, M. J. J. Blommers, L. M. C. Buydens, and G.Kateman. It is a development of a genetic algorithm for determining the structure of a sample of DNA based on spectrometric data about the sample. An interesting “cascaded” evaluation technique that greatly enhances the efficiency of their evaluation function has been used. The authors have used the bit strings to encode molecular structures. Their evaluation function have been measured the degree to decode structure conforms to the data that have been collected about the sample. The genetic algorithm evolves a description of molecular structure in agreement with the data collected.
Long term energy consumption forecasting using genetic algorithm was introduced by Korhan Karabulut, Ahmet Alkanand Ahmet S. Yilmaz. The most important part of electric utility resource planning is forecasting of the future load demand in the service area. This is achieve by constructing models on the relative information, such as climate and previous load demand data. They have used genetic algorithm to forecast long term electrical power consumption in the area covered by utility situated in the southeast turkey.
M. K. Deshmukh, C. Balakrishna Moorthy have introduced genetic algorithm to neural network model, namely, feed forward neural network for estimation of wind energy potential at a site. Their proposed model has been used to predict power output of wind conversion system. In this model, real time values of wind speed and variable are taken as input and electric power generated by WECS is computed as output of the model. In this model, neural network with genetic algorithm has been proposed for improvement of the output. The results obtained using this proposed model is compared with those obtained using back propagation algorithm. It is reported that their proposed modified model leads to improved accuracy in prediction of wind energy.
Kristin Bennett, Michael C. Ferris and Yannis E. Ioannidis have described a database query optimization problem and the adaptation of genetic algorithm to this problem. They have presented a method for encoding arbitrary binary trees as chromosomes and describe several crossover operations fo rsuch chromosomes. Preliminary computational comparisons with the current best known method for query optimization have been indicated that their proposed method is more promising approach than others. In particular, the output quality and the time needed to produce such solutions are comparable to and in general better than the current method.