Time Series Forecasting Models

A time series is a stretch of values on the same scale indexed by a time-like parameter. The basic data and parameters are functions. Time series take on a dazzling variety of shapes and forms, indeed there are as many time series as there are functions of real numbers.

Time Series Forecasting is an important area of the knowledge and there are many applications in the real world. Accurate forecasting is an essential element for many management decisions. There are several methods and techniques to find a good model that can be used to produce accurate forecasting the traditional techniques have your foundations in statistics.

Real problems time series data examples classified into four main categories (types):






Seasonal & Trended


Monthly international airline passengers



Monthly sales of French paper




Montly deaths & injuries ij UK roads



Maximum temperature in Melbourne




Chemical concentration readings



Daily IBM common stock closing prices




Annual number of lynx



Seismograph of the Kobe earthquake

Univariate time series
This term refers to a time-series that consists of single (scalar) observations recorded sequentially through time, e.g. the monthly unemployment rate, are monthly CO2 concentrations and southern oscillations to predict el nino effects.

Multivariate time series
Multivariate time series analysis is used when one wants to model and explain the interactions and comovements among a group of time series variables:

    • Consumption and income
    • Stock prices and dividends
    • Forward and spot exchange rates
    • Interest rates, money growth, income, inflation

Stock and Watson state that macroeconometricians do four things with multivariate time series:

  1. Describe and summarize macroeconomic data
  2. Make macroeconomic forecasts
  3. Quantify what we do or do not know about the true structure of the macroeconomy
  4. Advise macroeconomic policymakers

The choice of these series is typically guided by both empirical experience and by economic theory, for example, the theory of the term structure of interest rates suggests that the spread between long and short term interest rates might be a useful predictor of future inflation.

Much research has gone into the development of ways of analysing multivariate time series (MTS) data in both the statistical and articial intelligence communities. Statistical MTS modelling methods include the Vector Auto-Regressive process, the Vector Auto-Regressive Moving Average process, and other non-linear and Bayesian approaches , while various AI methods have been developed for different purposes. These include dependence detection in MTS of categorical data , knowledge-based temporal abstraction , Bayesian clustering of similar MTSs , and forecasting.

Is multivariate better than univariate?

Multivariate methods are very important in economics and much less so in other applications of forecasting. In standard textbooks on time-series analysis, multivariate extensions are given a marginal position only. Empirical examples outside economics are rare. Exceptions are data sets with a predator-prey background, such as the notorious data on the population of the Canadian lynx and the snowshoe hare. In contrast, the multivariate view is central in economics, where single variables are traditionally viewed in the context of relationships to other variables. Contrary to other disciplines, economists may even reject the idea of univariate time-series modeling on grounds of the theoretical interdependence, which appears to be an exaggerated position. In forecasting, and even in economics, multivariate models are not necessarily better than univariate ones. While multivariate models are convenient in modeling interesting interdependencies and achieve a better (not worse) fit within a given sample, it is often found that univariate methods outperform multivariate methods out of sample. Among others, one may name as possible reasons:

  • Multivariate models have more parameters than univariate ones. Every additional parameter is an unknown quantity and has to be estimated. This estimation brings in an additional source of error due to sampling variation.
  • The number of potential candidates for multivariate models exceeds its univariate counterpart. Model selection is therefore more complex and lengthier and more susceptible to errors, which then affect prediction.
  • It is difficult to generalize nonlinear procedures to the multivariate case. Generally, multivariate models must have a simpler structure than univariate ones, to overcome the additional complexity that is imposed by being multivariate. For example, while a researcher may use a nonlinear model for univariate data, she may refrain from using the multivariate counterpart or such a generalization may not have been developed. Then, multivariate models will miss the nonlinearities that are handled properly by the univariate models.
  • Outliers can have a more serious effect on multivariate than one univariate forecasts. Moreover, it is easier to spot and control outliers in the univariate context.



1. Exponential Smoothing
Exponential smoothing is simply an adjustment technique which takes the previous period’s forecast, and adjusts it up or down based on what actually occured in that period. Whereas in Single Moving Averages the past observations are weighted equally, Exponential Smoothing assigns exponentially decreasing weights as the observation get older.
In other words, recent observations are given relatively more weight in forecasting than the older observations. The smoothing factor is used to automatically calculate how much weight to apply to each previous period.
There is also another exponential smoothing model called adaptive response rate single exponential smoothing which tracks the forecast performance and automatically adjusts to allow for shifting patterns.
Triple Exponential Smoothing is based on three equations (overall smoothing, trend smoothing and seasonal smoothing) called the “Holt-Winters” (HW) method after the names of the inventors.

Table1: The fifteen exponential smoothing methods

Trend Component

Seasonal Component

N (None)

A (Additive)

M (Multiplicative)

N (None)




A (Additive)




Ad (Additive damped)




M (Multiplicative)




Md (Multiplicative damped)




Some of these methods are better known under other names. For example, cell (N,N) describes the simple exponential smoothing (or SES) method, cell (A,N) describes Holt’s linear method, and cell (Ad,N) describes the damped trend method. The additive Holt-Winters’ method is given by cell (A,A) and the multiplicative Holt-Winters’ method is given by cell (A,M). The other cells correspond to less commonly used but analogous methods.

An exponential smoothing method is an algorithm for producing point forecasts only. The underlying stochastic state space model gives the same point forecasts, but also provides a framework for computing prediction intervals and other properties.

For each exponential smoothing method in Table 1, Hyndman et al. (2002) describe two possible innovations state space models, one corresponding to a model with additive errors and the other to a model with multiplicative errors. If the same parameter values are used, these two models give equivalent point forecasts, although different prediction intervals. Thus there are 30 potential models described in this classification.

Usually a three-character string identifying method using the framework terminology of Hyndman et al. (2002) and Hyndman et al. (2008). The first letter denotes the error type (“A”, “M” or “Z”); the second letter denotes the trend type (“N”,”A”,”M” or “Z”); and the third letter denotes the season type (“N”,”A”,”M” or “Z”). In all cases, “N”=none, “A”=additive, “M”=multiplicative and “Z”=automatically selected. So, for example, “ANN” is simple exponential smoothing with additive errors, “MAM” is multiplicative Holt-Winters’ method with multiplicative errors, and so on.

Early attempts to study time series, particularly in the 19th century, were generally characterized by the idea of a deterministic world. It was the major contribution of Yule (1927) which launched the notion of stochasticity in time series by postulating that every time series can be regarded as the realization of a stochastic process. Based on this simple idea, a number of time series methods have been developed since then. Workers such as Slutsky, Walker, Yaglom, and Yule first formulated the concept of autoregressive (AR) and moving average (MA) models. Wold’s decomposition theorem led to the formulation and solution of the linear forecasting problem of Kolmogorov (1941). Since then, a considerable body of literature has appeared in the area of time series, dealing with parameter estimation, identification, model checking, and forecasting; see, e.g., Newbold (1983) for an early survey.

The publication Time Series Analysis: Forecasting and Control by Box and Jenkins (1970) integrated the existing knowledge. Moreover, these authors developed a coherent, versatile three-stage iterative cycle for time series identification, estimation, and verification (rightly known as the Box–Jenkins approach). The book has had an enormous impact on the theory and practice of modern time series analysis and forecasting. With the advent of the computer, it popularized the use of autoregressive integrated moving average (ARIMA) models and their extensions in many areas of science.

The acronym ARIMA stands for “Auto-Regressive Integrated Moving Average.” Lags of the differenced series appearing in the forecasting equation are called “auto-regressive” terms, lags of the forecast errors are called “moving average” terms, and a time series which needs to be differenced to be made stationary is said to be an “integrated” version of a stationary series. Exponential smoothing models (i.e., exponential weighted moving averages) are all special cases of ARIMA models.

The input series for ARIMA needs to be stationary, that is, it should have a constant mean, variance, and autocorrelation through time. Therefore, usually the series first needs to be differenced until it is stationary (this also often requires log transforming the data to stabilize the variance). In order to determine the necessary level of differencing, you should examine the plot of the data and autocorrelogram. Significant changes in level (strong upward or downward changes) usually require first order non seasonal (lag=1) differencing; strong changes of slope usually require second order non seasonal differencing. Seasonal patterns require respective seasonal differencing (see below). If the estimated autocorrelation coefficients decline slowly at longer lags, first order differencing is usually needed. However, you should keep in mind that some time series may require little or no differencing, and that over differenced series produce less stable coefficient estimates.

Indeed, forecasting discrete time series processes through univariate ARIMA models, transfer function (dynamic regression) models, and multivariate (vector) ARIMA models has generated quite a few studies. Often these studies were of an empirical nature, using one or more benchmark methods/models as a comparison. Without pretending to be complete, Table 2 gives a list of these studies.


Forecast horizon



Univariate ARIMA

Electricity load (min)

1–30 min

Wiener filter

Di Caprio, Genesio, Pozzi, and Vicino (1983)

Quarterly automobile insurance

paid claim costs

8 quarters

Log-linear regression

Cummins and Griepentrog (1985)

Daily federal funds rate

1 day

Random walk

Hein and Spudeck (1988)

Quarterly macroeconomic data

1–8 quarters

Wharton model

Dhrymes and Peristiani (1988)

Monthly department store sales

1 month

Simple exponential smoothing

Geurts and Kelly (1986, 1990),

Pack (1990)

Monthly demand for telephone services

3 years

Univariate state space

Grambsch and Stahel (1990)

Yearly population totals

20–30 years

Demographic models

Pflaumer (1992)

Monthly tourism demand

1–24 months

Univariate state space,

multivariate state space

du Preez and Witt (2003)

Dynamic regression/transfer function

Monthly telecommunications traffic

1 month

Univariate ARIMA

Layton, Defris, and Zehnwirth (1986)

Weekly sales data

2 years


Leone (1987)

Daily call volumes

1 week


Bianchi, Jarrett, and Hanumara (1998)

Monthly employment levels

1–12 months

Univariate ARIMA

Weller (1989)

Monthly and quarterly consumption of natural gas

1 month/1 quarter

Univariate ARIMA

Liu and Lin (1991)

Monthly electricity consumption

1–3 years

Univariate ARIMA

Harris and Liu (1993)


Yearly municipal budget data


Univariate ARIMA

Downs and Rocke (1983)

Monthly accounting data

1 month

Regression, univariate, ARIMA,

transfer function

Hillmer, Larcker, and Schroeder (1983)

Quarterly macroeconomic data

1–10 quarters

Judgmental methods, univariate


Öller (1985)

Monthly truck sales

1–13 months

Univariate ARIMA,

Heuts and Bronckers (1988)

Monthly hospital patient movements

2 years

Univariate ARIMA,

Lin (1989)

Quarterly unemployment rate

1–8 quarters

Transfer function

Edlund and Karlsson (1993)

Comparisons with exponential smoothing
The exponential smoothing state space models are all non-stationary. Models with seasonality or non-damped trend (or both) have two unit roots; all other models—that is, non-seasonal models with either no trend or damped trend—have one unit root. It is possible to define a stationary model with similar characteristics to exponential smoothing, but this is not normally done. The philosophy of exponential smoothing is that the world is non-stationary. So if a stationary model is required, ARIMA models are better. One advantage of the exponential smoothing models is that they can be non-linear. So time series that exhibit non-linear characteristics including heteroscedasticity may be better modelled using exponential smoothing state space models.

For seasonal data, there are many more ARIMA models than the 30 possible models in the exponential smoothing. The larger model space of ARIMA models actually harms forecasting performance because it introduces additional uncertainty. The smaller exponential smoothing class is sufficiently rich to capture the dynamics of almost all real business and economic time series.

There is a widespread myth that ARIMA models are more general than exponential smoothing. This is not true. The two classes of models overlap. The linear exponential smoothing models are all special cases of ARIMA models. However, the non-linear exponential smoothing models have no equivalent ARIMA counterpart. On the other hand, there are many ARIMA models which have no exponential smoothing counterpart. Thus, the two model classes overlap and are complimentary; each has its strengths and weaknesses.

The algorithms are applicable to both seasonal and non-seasonal data.

State space and structural models and the Kalman filter
At the start of the 1980s, state space models were only beginning to be used by  statisticians for forecasting time series, although the ideas had been present in the engineering literature since Kalman’s (1960) ground-breaking work. State space models provide a unifying framework in which any linear time series model can be written. The key forecasting contribution of Kalman (1960) was to give a recursive algorithm (known as the Kalman filter) for computing forecasts. Statisticians became interested in state space models when Schweppe (1965) showed that the Kalman filter provides an efficient algorithm for computing the one-step-ahead prediction errors and associated variances needed to produce the likelihood function. Shumway and Stoffer (1982) combined the EM algorithm with the Kalman filter to give a general approach to forecasting time series using state space models, including allowing for missing observations. A particular class of state space models, known as “dynamic linear models” (DLM), was introduced by Harrison and Stevens (1976), who also proposed a Bayesian approach to estimation. Fildes (1983) compared the forecasts obtained using Harrison and Stevens method with those from simpler methods such as exponential smoothing, and concluded that the additional complexity did not lead to improved forecasting performance. The modelling and estimation approach of Harrison and Stevens was further developed by West, Harrison, and Migon (1985) and West and Harrison (1989). Harvey (1984, 1989) extended the class of models and followed a non-Bayesian approach to estimation. He also renamed the models “structural models”, although in later papers he uses the term “unobserved component models”. Harvey (2006) provides a comprehensive review and introduction to this class of models including continuous-time and non-Gaussian variations.

These models bear many similarities with exponential smoothing methods, but have multiple sources of random error. In particular, the “basic structural model” (BSM) is similar to Holt–Winters’ method for seasonal data and includes level, trend and seasonal components. Ray (1989) discussed convergence rates for the linear growth structural model and showed that the initial states (usually chosen subjectively) have a nonnegligible impact on forecasts. Harvey and Snyder (1990) proposed some continuous-time structural models for use in forecasting lead time demand for inventory control. Proietti (2000) discussed several variations on the BSM, compared their properties and evaluated the resulting forecasts. Non-Gaussian structural models have been the subject of a large number of papers, beginning with the power steady model of Smith (1979) with further development by West et al. (1985).

Another class of state space models, known as “balanced state space models”, has been used primarily for forecasting macroeconomic time series. Mittnik (1990) provided a survey of this class of models, and Vinod and Basu (1995) obtained forecasts of consumption, income, and interest rates using balanced state space models. These models have only one source of random error and subsume various other time series models including ARMAX models, ARMA models, and rational distributed lag models. A related class of state space models are the “single source of error” models that underly exponential smoothing methods.

Nonlinear models

Time series prediction is a very challenging signal processing problem as in real situations it is typically a function of a large number of factors most of which are unknown or inaccessible at the time of prediction. Although such time series appear as very noisy, non-stationary and non-linear signals, its history carries a significant evidence that can be used to build the predictive model.

Although linearity is a useful assumption and a powerful tool in many areas, it became increasingly clear in the late 1970s and early 1980s that linear models are insufficient in many real applications. For example, sustained animal population size cycles (the well-known Canadian lynx data), sustained solar cycles (annual sunspot numbers), energy flow, and amplitude–frequency relations were found not to be suitable for linear models.

Functional-coefficient model
A functional coefficient AR (FCAR or FAR) model is an AR model in which the AR coefficients are allowed to vary as a measurable smooth function of another variable, such as a lagged value of the time series itself or an exogenous variable. The FCAR model includes TAR and STAR models as special cases, and is analogous to the generalized additive model of Hastie and Tibshirani (1991). Chen and Tsay (1993) proposed a modeling procedure using ideas from both parametric and nonparametric statistics. The approach assumes little prior information on model structure without suffering from the bcurse of dimensionalityQ; see also Cai, Fan, and Yao (2000).

Harvill and Ray (2005) presented multi-step-ahead forecasting results using univariate and multivariate functional coefficient (V)FCAR models. These authors restricted their comparison to three forecasting methods: the naive plug-in predictor, the bootstrap predictor, and the multi-stage predictor. Both simulation and empirical results indicate that the bootstrap method appears to give slightly more accurate forecast results. A potentially useful area of future research is whether the forecasting power of VFCAR models can be enhanced by using exogenous variables.

Neural nets
An artificial neural network (ANN) can be useful for nonlinear processes that have an unknown functional relationship and as a result are difficult to fit. The main idea with ANNs is that inputs, or dependent variables, get filtered through one or more hidden layers each of which consist of hidden units, or nodes, before they reach the output variable. The intermediate output is related to the final output. One major application area of ANNs is forecasting; see Zhang, Patuwo, and Hu (1998) and Hippert, Pedreira, and Souza (2001) for good surveys of the literature. Numerous studies have documented the successes of ANNs in forecasting financial data. However, in two editorials in this Journal, Chatfield (1993, 1995) questioned whether ANNs had been oversold as a miracle forecasting technique. This was followed by several papers documenting that naïve models such as the random walk can outperform ANNs.

A general problem with nonlinear models is the “curse of model complexity and model over-parametrization”. If parsimony is considered to be really important, then it is interesting to compare the out-of sample forecasting performance of linear versus nonlinear models, using a wide variety of different model selection criteria. This issue was considered in quite some depth by Swanson and White (1997). Their results suggested that a single hidden layer “feed-forward” ANN model, which has been by far the most popular in time series econometrics, offers a useful and flexible alternative to fixed specification linear models, particularly at forecast horizons greater than one-step-ahead. However, in contrast to Swanson and White, Heravi, Osborn, and Birchenhall (2004) found that linear models produce more accurate forecasts of monthly seasonally unadjusted European industrial production series than ANN models. Times change, and it is fair to say that the risk of over-parametrization and overfitting is now recognized by many authors; see, e.g., Hippert, Bunn, and Souza (2005) who use a large ANN (50 inputs, 15 hidden neurons, 24 outputs) to forecast daily electricity load profiles. Nevertheless, the question of whether or not an ANN is over-parametrized still remains unanswered.

Evolutionary Algorithm
An evolutionary algorithm (EA) uses a collection of heuristic rules to modify a population of trial solutions in such a way that each generation of trial values tends to be, on average, better than its predecessor. The measure for whether one trial solution is better than another is the trial solution’s fitness value. In statistical applications, the fitness is a function of the summary statistic being optimized (e.g., the log-likelihood).

Genetic algorithm is the most popular type of EA inspired by the basic principles of biological evolution and natural selection.  It is stochastic search algorithm which simulate the evolution of living organisms, where the fittest individuals dominate over the weaker ones, by mimicking the biological mechanisms of evolution, such as selection, crossover and mutation.

Stephen D. Sloan, Raymond W. saw, James J.Sluss have described genetic algorithm to forecast the long term quarterly sales of product in telecommunication technology sector. This has been used widely available economic indicators such as disposable personal income and new housing starts as independent variables. Authors have used individual chromosomes to indicate inclusion and dis-inclusion of specific economic variable as well as operational rules for combining the variables. In their proposed method, several features beyond those of canonical GA were also incorporated, including evolution of individual in distinct ecosystem with a specific level of intermarriage between ecosystems, the capability for a single gene in an individual’s chromosome to indicate a subroutine call to the complete chromosome of an individual from a previous generation, and hill-climbing applied to improve the most fit offspring produced by generation.

A Genetic Algorithm for Conformational Analysis of DNA by C. B. Lucasius, M. J. J. Blommers, L. M. C. Buydens, and G.Kateman. It is a development of a genetic algorithm for determining the structure of a sample of DNA based on spectrometric data about the sample. An interesting “cascaded” evaluation technique that greatly enhances the efficiency of their evaluation function has been used. The authors have used the bit strings to encode molecular structures. Their evaluation function have been measured the degree to decode structure conforms to the data that have been collected about the sample. The genetic algorithm evolves a description of molecular structure in agreement with the data collected.

Long term energy consumption forecasting using genetic algorithm was introduced by Korhan Karabulut, Ahmet Alkanand Ahmet S. Yilmaz. The most important part of electric utility resource planning is forecasting of the future load demand in the service area. This is achieve by constructing models on the relative information, such as climate and previous load demand data. They have used genetic algorithm to forecast long term electrical power consumption in the area covered by utility situated in the southeast turkey.

M. K. Deshmukh, C. Balakrishna Moorthy have introduced genetic algorithm to neural network model, namely, feed forward neural network for estimation of wind energy potential at a site. Their proposed model has been used to predict power output of wind conversion system. In this model, real time values of wind speed and variable are taken as input and electric power generated by WECS is computed as output of the model. In this model, neural network with genetic algorithm has been proposed for improvement of the output. The results obtained using this proposed model is compared with those obtained using back propagation algorithm. It is reported that their proposed modified model leads to improved accuracy in prediction of wind energy.

Kristin Bennett, Michael C. Ferris and Yannis E. Ioannidis have described a database query optimization problem and the adaptation of genetic algorithm to this problem. They have presented a method for encoding arbitrary binary trees as chromosomes and describe several crossover operations fo rsuch chromosomes. Preliminary computational comparisons with the current best known method for query optimization have been indicated that their proposed method is more promising approach than others. In particular, the output quality and the time needed to produce such solutions are comparable to and in general better than the current method.

Ovaj unos je objavljen u Nekategorizirano. Bookmarkirajte stalnu vezu.


Popunite niže tražene podatke ili kliknite na neku od ikona za prijavu:

WordPress.com Logo

Ovaj komentar pišete koristeći vaš WordPress.com račun. Odjava / Izmijeni )

Twitter picture

Ovaj komentar pišete koristeći vaš Twitter račun. Odjava / Izmijeni )

Facebook slika

Ovaj komentar pišete koristeći vaš Facebook račun. Odjava / Izmijeni )

Google+ photo

Ovaj komentar pišete koristeći vaš Google+ račun. Odjava / Izmijeni )

Spajanje na %s