Portfolio Selection Using Bayesian Analysis And Gibbs Sampling Finance Essay
Post on: 16 Март, 2015 No Comment
This paper contributes to portfolio selection methodology using a Bayesian forecast of the distribution of returns by stochastic approximation. New hierarchical priors on the mean vector and covariance matrix of returns are derived and implemented. Comparison’s between this approach and other Bayesian methods are studied with simulations on 20 years of historical data on global stock indices. It is demonstrated that a fully hierarchical Bayes procedure produces results superior to typical Bayesian formulations. In addition, an expected utility formulation increases performance significantly over methods that simply impute moment estimates into the Markowitz mean-variance model.
Portfolio theory is concerned with the allocation of an individual’s wealth among various available assets. The basic Markowitz version of the portfolio selection problem is (Markowitz 1952):
where is a column vector of proportions representing a portfolio of assets, and are the covariance matrix and mean column vector of asset returns. is the investor’s risk-aversion parameter, and e is a column unit vector. For no short sales restrictions, an additional constraint. can be added. This portfolio selection approach is termed the Mean-Variance (MV) method because it ranks portfolio weights by their mean-variance pairs. The set of optimal portfolios obtained as the level of risk aversion, , varies is termed the Markowitz efficient frontier.
The Markowitz MV method can be view as maximizing expected utility. For example, if the investor’s current wealth is W0, his terminal wealth is:
. (0)
According to Von Neumann and Morgenstern axioms, the investor determines w by considering the expected value of a non-decreasing utility function of W. Using the exponential utility function,
, (0)
and assuming y is distributed multivariate normal N(,), the maximization of expected utility reduces to ranking MV portfolios using (,;) in model (0).
Classical portfolio selection uses least-squares estimates of (,) in model (0). However, MV portfolio selection based on estimates of population moments leads to a problem of estimation risk that arises from the difference between the estimates and the true parameter values. It has been well documented that the problem of estimation risk is significant (Dickinson 1974; Putnam and Quintana 1991; Pari and Chen 1985; Frankfurter, Phillips, and Seagle 1971; Jobson and Korkie 1980). Empirical studies of estimation risk associated with least-squares estimates appear in Levy and Sarnat (1970); Solnik (1982); Board and Sutcliffe, (1992); Chopra, Hensel, and Turner (1993); Chopra and Ziemba (1993). All of these studies conclude that resulting portfolios involve either extreme volatility or lack of diversification.
The use of Bayes and empirical Bayes estimators to estimate (,) have been advocated by several researchers (Brown 1976; Klein and Bawa 1976; Bawa, Brown and Klein 1979; Jorion1986; Frost and Savarino 1986). Jorion (1986, 1991) employs Bayes modifications of James-Stein shrinking formulas (James and Stein 1960) to estimate , while Frost and Savarino (1986) employ empirical Bayes estimators of , assuming has intraclass structure. They show through simulated and historical data that MV portfolios using their respective Bayes estimates in model (0) dominate MV portfolios using classical least squares estimates. See also Kadiyala and Karlsson (1997), Kandel, McCulloch and Stambaugh (1995), and Shaken (1987).
This paper examines a fully hierarchical Bayes model for (,). These models are multivariate and thus can capture more complete information on the interdependence between assets than previous models. Although these models are cross-sectional, one-step forward forecasts based on the posterior predictive distribution of returns are available for ranking portfolios. The posterior predictive distribution has been proposed for forecasting univariate ARMA models since Zellner (1971); also see West and Harrison (1989). This paper will empirically demonstrate that Bayesian forecasts are superior to moment estimates in portfolio ranking. Moreover, this approach applies to any utility function.
Marriott et al. (1994) show how to obtain the predictive distribution for a vector of future values via the Gibbs sampler and Monte Carlo integration. Kim, Shephard, and Chib (1998) exploit MCMC sampling methods to provide a practical likelihood based framework for the analysis of stochastic volatility models. These methods are used to compare the fit of stochastic volatility and GARCH models. Nakatsuma and Tsurumi (1996) compare small-sample properties of Bayes estimation and maximum likelihood estimation (MLE) of ARMA-GARCH models using MCMC sampling. McCulloch and Tsay (1994) use the Gibbs sampler for Bayesian analysis of AR models. This paper also exploits Monte Carlo Markov Chain (MCMC) sampling methods to obtain a practical stochastic approximation to the posterior predictive distribution and its moments.
This paper is structured as follows. The definition of posterior predictive distributions is given in section 1. Maximum expected utility is defined in section 2. In section 3, we describe data on eleven country-stock index funds provided by Morgan Stanley Capital International. In addition, designs for comparing the different Bayesian models are described. Bayesian data models including the fully hierarchical prior are explained in section 4. Sections 5 and 6 describe the results and conclusion, respectively.
Bayes Posterior Predictive Distributions
Denote observed returns on m assets by y and future, or unobserved, returns by. Let and denote p parameters and q hyperparameters, respectively. The parametric family of the joint likelihood of y and will be denoted by and depends on the joint parameters only through the low-level parameter . Denote the prior distribution of (,) by. Nonhierarchical models fix and compute posterior distributions using the prior. while hierarchical models compute posterior distributions using the joint prior .
In the portfolio selection problem, = (,) and will represent a vector of hyperparameters in the prior for (,). Portfolio selection using posterior predictive distributions addresses two unknown quantities, and (,), with the primary goal being to gain information about with (,) as nuisance parameters. The advantage of the hierarchical model, with priors instead of point estimates of hyperparameters, is that the posterior distributions will reflect the appropriate uncertainty in the hyperparameters. The disadvantage is that the posterior predictive distribution will not be analytically tractable usually; however, the method based on the MCMC sampler provides a stochastic approximation of the posterior predictive distribution.
According to the likelihood principal all evidence about (,,) is contained in the joint likelihood function (for an overview see Bjrnstad, 1990). Based on this likelihood, we wish to develop a posterior predictive distribution for. by eliminating (,) from the joint likelihood. The Bayes approach for this problem is to integrate out (, ) using the joint prior. The resulting predictive distribution for given the data, y, is the following:
. (0)
A stochastic approximation of the posterior predictive distribution is generated by simulation, using the MCMC sampler if necessary, using to generate. and repeating these steps to obtain more simulated observations.
Maximum Expected Utility
For the reader’s convenience, we repeat the basic notation. Let denote portfolio weights satisfying. The inner product is the portfolio-return on future investment performance. An investor will choose a utility for wealth. This utility is denoted by a monotonically increasing, concave function. where 0 is a fixed parameter denoting risk aversion. The posterior expected utility of. where W0 is initial wealth, given the data y is:
. (0)
This expectation exists under standard regularity conditions.
The direct utility (DU) optimal portfolio model is a solution to the following model:
. (0)
Typically, model (0) must be solved with a non-linear optimization algorithm. Many standard algorithms exist, such as sequential quadratic programming (see Gill, Murray, and Wright 1981, p. 237; Schittowski 1980, 1985) as implemented in MATLAB, enabling solutions for any utility function that is twice-continuously differentiable. In case the expected utility is not analytically tractable, it is necessary to contemplate samples from the posterior predictive distribution that can be used to approximate the expected utility. This is an advantage since portfolio selection can be carried out with a general utility function. Given a sample. from the predictive posterior distribution, the direct utility portfolio selection problem (0) is approximated by the model:
. (0)
Empirical Data Analysis
The daily stock market indices for 11 different countries over the period 1975-1994 are used in this comparison of DU and MV using different data models. The countries include US, UK, Canada, Belgium, Australia, France, Japan, Austria, Spain, Germany, and Hong Kong. The indices are compiled and provided by Morgan Stanley Capital International. Monthly returns were computed as the percentage changes in the index between consecutive last days of the month. Morgan Stanley Capital International provided two indices per country, one in local currency and one in $US. Our study is based on the returns in $US.
The data are partitioned into four periods of five consecutive years. This allows a comparison of the means, standard deviations, within-country serial correlations and between-country correlations. Table 1a displays the average-monthly asset-returns. Note the variability of the mean returns within countries across the four times periods. It appears that Spain exhibits the greatest change on mean return while the US is stable.
Table 1a also displays the standard deviation of the monthly returns, demonstrating considerable instability or risk over the four periods, especially for Hong Kong. Comparison of the means and standard deviations of returns reveals that the US has moderately stable returns with relatively low risk; while Hong Kong has consistently high return and risk.
Table 1b exhibits between-country covariance and correlation of monthly asset returns. When the covariance structure changes over time, there are important implications for the appropriate hierarchical model for the data. Although it is not shown in this table, there is much change over time in the covariance matrix for these data. For the extreme instance, the correlation between Spain and the other 10 countries appear to change this period; the correlation structure for Hong Kong appears to change as well.
Comparison Design
To examine the performance of different models, 180 overlapping, out-of-sample periods of one month each, covering the period Jan 1980 to Dec 1994, are used. That is, the first data set is from Jan 1975 through Dec 1979 (60 monthly observations) and the first out of sample observation is for Jan 1980. Our last out-of- sample portfolio is for Dec 1994. For a given model, we run the MCMC sampler independently on 180 data sets, and use the individual posterior distributions to form the portfolio. Our procedure can be summarized as follows:
Use 60 observations (initially those for months 1 to 60) to generate the joint posterior distributions of the means and covariances (via the MCMC sampler) and, in accordance with the decision theory rules, compute the posterior means of these distributions.
For a given , the risk aversion parameter, find the investment proportions w.
Apply these proportions to the actual returns observed in the next month to obtain the actual portfolio return for each model and value of .
Roll the sample forward by one month, e.g. months 2 to 61, and repeat steps (1) through (3).
This resulted in 180 sample periods being used. A common value, = 0.02, of the risk-aversion parameter was used. Computational time prevented expanding the procedure to a range of ’s in this study. This is planned for future research.
Bayesian Data Models
We will examine two hierarchical data models, and apply MCMC sampling to obtain estimates of the posterior distribution of (,) and. These estimates will then be used to solve models (0) and (0) for MV and DU portfolios respectively.
All models, even non-Bayesian, can be specified within a hierarchical Bayesian structure. In addition, the MCMC sampler can be used to solve even the simplest model (while a closed-form analytical solution may exist, it may still be easier to run the MCMC sampler to generate the posterior distributions). The following three data models will be tested empirically:
Classical Model
(0)
The values of are equal to 0.0001 allowing for a proper, diffuse hyperprior. The MCMC sampler with diffuse priors yields close approximations of the classical estimators.
James-Stein Model
Hierarchical Bayes Model
(0)
The degrees of freedom parameter, v0, is unrestricted other than where m is the number of asset, equal to 11 in this application. The values of are equal to 0.0001 allowing for a proper, diffuse hyperprior; and 0 is equal to 0.10n where n represents the sample size. In addition, 0 is a known correlation matrix with structure:
. (0)
In this application, the estimate of the correlation parameter 0 is 0.5.
Results
We will employ a method of comparison based on portfolio performance. We use the posterior means of t and t as inputs to the MV framework (0); or the posterior marginal predictive distribution as inputs to the direct utility framework (0). In either framework, we obtain a vector of weights wt. The actual performance of the portfolio. is then tracked over time t = 1,…, 180. Consequently, we obtain an actual simulation of how a model would have performed over the study period. This performance is compared for the various models: Classical, James-Stein, and Hierarchical Bayes. In addition to these three models, we include as a benchmark a heuristic portfolio-selection device that weighs each asset equally, denoted by the term Weighted in the ensuing figure.
Table 2 displays the portfolio performance comparisons summarized over the stream of 180 monthly returns. It is quite apparent that the Classical Models under-perform all the other models, including a naïve equal weight portfolio. The James-Stein Model portfolio produces superior results to the Classical Model portfolio. The additional edge in performance due to the Hierarchical Bayes Model is quite significant. In addition, the Direct Utility (DU) method increases performance significantly over methods that impute estimates into the MV model (0).
Figure 1 shows the equity performance of the various data models and their turnover rates (to be discussed shortly), where we show the growth of a hypothetical $1000 portfolio. The top series display the MV portfolios, while the middle series display the DU factors. The actual performance of the DU portfolios are obtained as the DU factor times the performance of the MV performance. Note the Hierarchical Bayes factor remains above 1 after the mid-1980’s.
An interesting point is to ask why the James-Stein and Hierarchical Bayes Models under-perform the Classical Model for the first 4 years and then significantly outperform the next 10 years (1985 and on). Closer inspection confirmed what we suspected from general knowledge of the history of these particular markets: some markets, such as the US, Japan, and Germany exhibited stable positive trends in the early 80’s. The Classical Model interprets the data of the late 70’s to invest in these markets. The shrinking characteristics of the Bayes models, however, hurt their performance during this period.
Portfolio return and the resulting commission costs are obviously of great interest in practical applications. We define turnover as
, (0)
that is, the portfolio turnover in a given month is the sum of the changes in portfolio weights from the previous month to that month. We will analyze the portfolio turnover for each of our models.
Figure 1 (bottom series) compares the portfolio turnover rates of the three data models using the DU method. It is quite apparent that the Classical Model under-performs all the other models most of the time. The James-Stein portfolio produces superior results to the Classical portfolio. The Hierarchical Bayes model results in significantly lower turnover rates. Note that the Classical portfolio showed significant gains in the early 1980’s but at the cost of high turnover rates.
Summary and Conclusion
A contribution of this paper was to employ practical hierarchical Bayesian models that incorporate a high degree of parameter uncertainty. A practical hierarchical Bayesian model accounting interclass covariance was applied to portfolio selection. The MCMC sampler was used to generate posterior prediction distributions and estimates of moments. The James-Stein model which has appeared previously in the finance literature is basically the Markowitz model using shrinking estimators in the mean, while the covariance matrix estimate is taken (independently of the mean) to be the sample covariance matrix. The Hierarchical Bayes model is a more general model, in both and .
We carried out a numerical optimization procedure to maximize expected utility using the MCMC samples form the posterior predictive distribution. This model resulted in an extra 1.5 percentage points per year in additional portfolio performance (on top of the Hierarchical Bayes model to estimate and and use the Markowitz model), which is quite a significant empirical result. This approach applies to a large class of utility functions and models for market returns.