.

Tuesday, January 15, 2019

Computational Efficiency of Polar

Lecture smells on Monte Carlo Methods Fall Semester, 2005 Courant Institute of Mathematical Sciences, NYU Jonathan Goodman, email&clxprotected nyu. edu Chapter 2 Simple Sampling of Gaussians. created August 26, 2005 Generating univariate or variable Gaussian random variables is simple and fast. there should be no reason invariably to habit approximate methods base, for example, on the Central limit theorem. 1 blow pounder It would be nice to get a commonplace expression from a standard uniform by inverting the diffusion power, but thither is no closed form formula for this distribution 2 x unction N (x) = P (X < x) = v1 ? ?? e? x /2 dx . The buffet Muller method is a 2 brilliant trick to have the best this by producing two independent standard standards from two independent uniforms. It is based on the old(prenominal) trick for calculating ? 2 e? x I= /2 dx . ?? This batchnot be reckon by integration the inde? nite intact does not have an algebraic expression i n tail end of primary(a) functions (exponentials, logs, trig functions). However, ? 2 e? x I2 = ? /2 e? y dx ?? 2 ? /2 ? ?? 2 e? (x dy = ?? +y 2 )/2 dxdy . ?? The last integral ordure be calculated utilize polar unionizes x = r cos(? ), y = r sin(? with ara divisor dxdy = rdrd? , so that 2? I2 = r = 0? e? r 2 /2 rdrd? = 2? r = 0? e? r 2 /2 rdr . ? =0 Unlike the original x integral, this r integral is elementary. The substitute s = r2 /2 gives ds = rdr and ? e? s ds = 2? . I 2 = 2? s=0 The street corner Muller algorithm is a probabilistic interpretation of this trick. If (X, Y ) is a pair of independent standard normals, consequently the opportunity density is a product 2 2 1 1 ? (x2 +y2 )/2 1 e . f (x, y ) = v e? x /2 v e? y /2 = 2? 2? 2? 1 Since this density is radially symmetric, it is natural to consider the polar coordinate random variables (R, ? de? ned by 0 ? ? < 2? and X = R cos(? ), and Y = R sin(? ). Clearly ? is uniformly distributed in the interval 0, 2? and whitethorn be savourd using ? = 2? U1 . Unlike the original distribution function N (x), there is a simple expression for the R distribution function 2? r G(R) = P (R ? r) = r =0 ?=0 r 1 ? r 2 /2 e rdrd? = 2? e? r 2 /2 rdr . r =0 The comparable change of variable r 2 /2 = s, r dr = ds (so that r = r when s = r2 /2) allows us to calculate r 2 /2 e? s dx = 1 ? e? r G(r) = 2 /2 . s=0 thusly, we may sample R by solving the distribution function equality1 G(R) = 1 ? e? R 2 /2 = 1 ?U2 , whose solution is R = ? 2 ln(U2 ). Altogether, the stroke Muller method takes independent standard uniform random variables U1 and U2 and claims independent standard normals X and Y using the formulas ? = 2? U1 , R = ?2 ln(U2 ) , X = R cos(? ) , Y = R sin(? ) . (1) It may seem odd that X and Y in (13) atomic number 18 independent given that they use the same R and ?. Not only does our algebra shows that this is true, but we can test the liberty computationally, and it will be con? rmed. Part of this method was generating a distributor clo genuine at random on the unit rotary converter. We suggested doing this by choosing ? niformly in the interval 0, 2? then taking the point on the circle to be (cos(? ), sin(? )). This has the possible drawback that the computer must evaluate the sine and cosine functions. another(prenominal) way of life to do this2 is to choose a point uniformly in the 2 ? 2 squargon ? 1 ? x ? 1, 1 ? y ? 1 then rejecting it if it falls step forwardside the unit circle. The ? rst accepted point will be uniformly distributed in the unit disk x2 + y 2 ? 1, so its angle will be random and uniformly distributed. The ? nal step is to get a point on the unit circle x2 + y 2 = 1 by dividing by the length.The methods have affect accuracy (both are exact in exact arithmetic). What distinguishes them is computer surgical procedure (a topic discussed more in a later lecture, hopefully). The rejection method, with an acceptance probability ? ? 4 78%, seems e? cient, but rejection can break the knowledge pipeline and slow a computation by a positionor out of ten. Also, the square root needed to compute 1 Recall that 1 ? U2 is a standard uniform if U2 is. for example, in the dubious book numeral Recipies. 2 Suggested, 2 the length may not be prompt to evaluate than sine and cosine.Moreover, the rejection method uses two uniforms while the ? method uses besides one. The method can be reversed to solve another taste problem, generating a random point on the unit spnere in Rn . If we generate n independent standard normals, then the vector X = (X1 , . . . , Xn ) has all angles equally n likely (because the probability density is f (x) = v1 ? exp(? (x2 + +x2 )/2), n 1 2 which is radially symmetric. Therefore X/ X is uniformly distributed on the unit sphere, as desired. 1. 1 Other methods for univariate normals The Box Muller method is elegant and clean fast and is ? ne for free-and-easy omputations, but it may not be the best meth od for gravely core users. Many software packages have native standard normal random number generators, which (if they are any good) use expertly optimized methods. There is very fast and accurate software on the web for promptly inverting the normal distribution function N (x). This is particularly important for quasi(prenominal)(prenominal) Monte Carlo, which substitutes equidistributed sequences for random sequences (see a later lecture). 2 Multivariate normals An n gene multivariate normal, X , is characterized by its esteem = E X and its co deviation hyaloplasm C = E (X ? )(X ? )t .We discuss the problem of generating such an X with correspond nought, since we achieve think up by adding to a mean adjust multivariate normal. The key to generating such an X is the fact that if Y is an m destiny mean zero multivariate normal with covariance D and X = AY , then X is a mean zero multivariate normal with covariance t C = E X X t = E AY (AY ) = AE Y Y t At = ADAt . We kn ow how to sample the n share multivariate normal with D = I , just take the components of Y to be independent univariate standard normals. The formula X = AY will produce the desired covariance matrix if we ? nd A with AAt = C .A simple way to do this in practice is to use the Choleski decomposition from numerical analogue algebra. This is a simple algorithm that produces a lower triangular matrix, L, so that LLt = C . It works for any positive de? nite C . In sensible applications it is common that one has not C but its inverse, H . This would happen, for example, if X had the Gibbs-Boltzmann distribution with kT = 1 (its easy to change this) and nix 1 X t HX , and probability 2 1 density Z exp(? 1 X t HX ). In large scale physical problems it may be impracti2 cal to calculate and store the covariance matrix C = H ? though the Choleski factorisation H = LLt is available. Note that3 H ? 1 = L? t L? 1 , so the choice 3 It is tralatitious to write L? t for the transpose of L? 1 , which alike is the inverse of Lt . 3 A = L? t works. Computing X = L? t Y is the same as solving for X in the equation Y = Lt X , which is the process of back substitution in numerical elongated algebra. In some applications one knows the eigenvectors of C (which also are the eigenvectors of H ), and the corresponding eigen shelters. These (either the eigenvectors or the eigenvectors and eigenvalues) sometimes are called principal com2 ponents.Let qj be the eigenvectors, normalized to be orthonormal, and ? j the corresponding eigenvalues of C , so that 2 Cqj = ? j qj , t qj qk = ? jk . t Denote the qj component of X by Zj = qj X . This is a linear function of X and t therefore Gaussian with mean zero. Its variance (note Zj = Zj = X t qj ) is 2 t t t 2 E Zj = E Zj Zj = qj E XX t qj = qj Cqj = ? j . A similar calculation shows that Zj and Zk are uncorrelated and hence (as components of a multivariate normal) independent. Therefore, we can generate Yj as independent standard nor mals and sample the Zj using Zj = ? j Yj . (2) After that, we can get an X using Zj qj . X= (3) j =1 We fictionalize this in matrix terms. Let Q be the orthogonal matrix whose columns are the orthonormal eigenvectors of C , and let ? 2 be the diagonal ma2 trix with ? j in the (j, j ) diagonal position. The eigenvalue/eigenvector relations are CQ = Q? 2 , Qt Q = I = QQt . (4) The multivariate normal vector Z = Qt X then has covariance matrix E ZZ t = E Qt XX t Q = Qt CQ = ? 2 . This says that the Zj , the components of Z , are 2 independent univariate normals with variances ? j . Therefore, we may sample Z by choosing its components by (14) and then reconstruct X by X = QZ , which s the same as (15). Alternatively, we can calculate, using (17) that t C = Q? 2 Qt = Q?? Qt = (Q? ) (Q? ) . Therefore A = Q? satis? es AAt = C and X = AY = Q? Y = QZ has covariance C if the components of Y are independent standard univariate normals or 2 the components of Z are independent univariate norm als with variance ? j . 3 Brownian motion examples We beautify these ideas for various kids of Brownian motion. Let X (t) be a Brownian motion row. direct a ? nal time t and a time step ? t = T /n. The 4 ceremonial occasion times will be tj = j ? t and the observations (or observation values) will be Xj = X (tj ).These observations may be assembled into a vector X = (X1 , . . . , Xn )t . We seek to generate sample observation vectors (or observation paths). How we do this depends on the terminal point conditions. The simplest case is standard Brownian motion. Specifying X (0) = 0 is a Dirichlet boundary condition at t = 0. verbal expression nothing about X (T ) is a free (or Neumann) condition at t = T . The joint probability density for the observation vector, f (x) = f (x1 , . . . , xn ), is found by multiplying the conditional densities. devoted Xk = X (tk ), the beside observation Xk+1 = X (tk + ? ) is Gaussian with mean Xk and variance ? t, so its conditional density is v 2 1 e? (xk+1 ? Xk ) /2? t . 2? ?t Multiply these together and use X0 = 0 and you ? nd (with the convention x0 = 0) f (x1 , . . . , xn ) = 3. 1 1 2? ?t n/2 exp ?1 2 ? Deltat n? 1 (xk+1 ? xk )2 . (5) k=0 The random strait method The simplest and peradventure best way to generate a sample observation path, X , comes from the derivation of (1). First generate X1 = X (? t) as a mean zero v univariate normal with mean zero and variance ? t, i. e. X1 = ? tY1 . Given X1 , X2 is a univariate normal with mean X1 and variance ? , so we may v take X2 = X1 + ? tY2 , and so on. This is the random walk method. If you just want to quarter standard Brownian motion paths, stop here. We push on for pedigogical purposes and to develop strategies that put one across to other types of Brownian motion. We describe the random walk method in terms of the matrices above, starting by identifying the matrices C and H . Examining (1) leads to ? 2 ? 1 0 ? ? ? 1 2 ? 1 0 ? ? .. .. .. . . . 1 ? 0 ? 1 ? H= ?. .. ?t ? . . 2 ? 1 ?. ? .. ? . ? 1 2 0 0 ? 1 ? 0 .? .? .? ? ? ? ? 0? ? ? ?1 ? 1 This is a tridiagonal matrix with pattern ? 1, 2, ? except at the bottom unspoiled corner. iodine can calculate the covariances Cjk from the random walk representation v Xk = ? t (Y1 + + Yk ) . 5 Since the Yj are independent, we have Ckk = var(Xk ) = ? t k var(Yj ) = tk , and, supposing j < k , Cjk = E Xj Xk = ? tE ((Y1 + + Yj ) + (Yj +1 + + Yk )) (Y1 + + Yj ) = 2 ?tE (Y1 + + Yj ) = tj . These combine into the familiar formula Cjk = cov(X (tj ), X (tk )) = min(tj , tk ) . This is the same as saying that the ? 1 ?1 ? ?. ?. C = ? t ? . ? ? ? 1 matrix C is 1 2 2 2 . . . 3 . . . 2 3 ? 1 2? ? ? 3? .? .? .? .. . (6) The random walk method for generating X may be expresses as ? ? ? ?? Y ? X1 1 1 0 01 ? ? ? ?1 1 0 0 ?? ? ? ?? . ? ?.? ?.? v? ? . ?? ? ?.? 1 0 . ?? . ? . .? ? . ? = ? t ? 1 1 ? ? ? ? ?. . ?? .. ? ? ? ?. . ?? . .. ? ? ? ? 11 1 1 Yn Xn Thus, X = AY with ? ? 1 0 01 ?1 1 0 0 ? ? ? v? .? .? . ?1 1 1 0 .? A = ? t ? ?. . ? .. .. ?. . ? . 11 1 1 (7) The reader should do the matrix multiplication to check that indeed C = AAt for (6) and (7). Notice that H is a sparse matrix indicating short affirm interactions while C is full indicating long range correlations.This is true of in great number of physical applications, though it is rare to have an definite formula for C . 6 We also can calculate the Choleski factorization of H . The reader can convince herself or himself that the Choleski factor, L, is bidiagonal, with nonzeros only on or immediately below the diagonal. However, the formulas are simpler if we reverse the order of the coordinates. Therefore we de? ne the coordinate reversed observation vector t X = (Xn , xn? 1 , . . . , Xn ) and whose covariance matrix is ? tn ? tn? 1 ? C=? . ?. . t1 tn? 1 tn? 1 t1 t1 .. . ? ? ? , ? t1 and get-up-and-go matrix ? 1 ? 1 0 ? 0 .? .? .? ? ? ?. ? 0? ? ? ?1 ? 2 ? ? ? 1 2 ? 1 0 ? ? .. .. .. . . . 1 ? 0 ? 1 ? H= .. ?t ? . . ?. . 2 ? 1 ? ? .. ? . ? 1 2 0 0 ? 1 We seek the Choleski factorization H = LLt ? l1 0 ? m2 l2 1? L= v ? m3 ?t ? 0 ? . .. . . . with bidiagonal ? ? 0 ? ?. .. ? . ? .. . Multiplying out H = LLt leads to equations that successively determine the lk and mk 2 l1 l 1 m2 2 2 l1 + l 2 l 2 m3 = 1 =? l1 = 1 , = ? 1 =? m2 = ? 1 , = 2 =? l2 = 1 , = 1 =? m3 = ? 1 , and so on , The result is H = LLt with L simply ? 1 0 ? ? 1 10 1? .. L= v ? . ?t ? ? 1 ? . .. .. . . . . 7 ? ? ? ?. ? ? The take in algorithm using this Y = Lt X ? ? ? 1 Yn ? Yn? 1 ? ? ? ? ?0 ? ? 1? ? ? ? ? . ?= v ? ?.? ?t ? ?.? ?. ? ? ?. . Y1 0 information is to ? nd X from Y by solving ?1 0 1 .. . ?1 .. . .. . 0 ?? 0 Xn . ? ? Xn? 1 . ?? . ?? ?? . 0 ?? . ?? . ?? ?? ?1 X1 1 ? ? ? ? ? ? ? ? ? Solving from the bottom up (back substitution), we have Y1 = Y2 = v 1 v X1 =? X1 = ? tY1 , ?t v 1 v (X2 ? X1 ) =? X2 = X1 + ? tY2 , etc. ?t This undivided process turns out to give the same random walk sampling met hod. Had we not gone to the time reversed (X , etc. variables, we could have calculated the bidiagonal Choleski factor L numerically. This works for any problem with a tridiagonal energy matrix H and has a name in the control possibleness/estimation literature that escapes me. In particular, it will allow to ? nd sample Brownian motion paths with other boundary conditions. 3. 2 The Brownian pair construction The Brownian bridge construction is useful in the mathematical theory of Brownian motion. It also is the basis for the success of quasi Monte Carlo methods in ? nance. Suppose n is a power of 2 n = 2L . We will construct the observation path X through a sequence of L re? ements. First, light upon that Xn is a univariate normal with mean zero and variance T , so we may take (with Yk,l being independent standard normals) v Xn = T Y1,1 . Given the value of Xn , the midoint observation, Xn/2 , is a univariate normal4 with mean 1 Xn and variance T /4, so we may take 2 Xn 2 v 1 T = Xn + Y2,1 . 2 2 At the ? rst level, we chose the endpoint value for X . We could draw a ? rst level path by connenting Xn to zero with a straight line. At the second level, or ? rst re? nement, we created a midpoint value. The second level path could be piecewise linear, connecting 0 to X n to Xn . 4 We assign this and related claims below as exercises for the student. 8 The second re? nement level creates values for the quarter points. Given n X n , X n is a normal with mean 1 X n and variance 1 T . Similarly, X 34 is a 2 42 2 4 2 1 1T normal with mean 2 (X n + Xn ) and variance 4 2 . Therefore, we may take 2 Xn = 4 1 1 Xn + 22 2 T Y3,1 2 and n X 34 = 1 1 (X n + Xn ) + 2 2 2 T Y3,2 . 2 1 The level three path would be piecewise linear with breakpoints at 1 , 2 , and 3 . 4 4 Note that in each case we add a mean zero normal of the withdraw variance to the linear interpolation value.In the general step, we go from the level k ? 1 path to the level k paths by creating values for the m idpoints of the level k ? 1 intervals. The level k observations are X j . The values with even j are cognise from the previous 2k? 1 level, so we need values for odd j . That is, we want to interpolate between the j = 2m value and the j = 2m + 2 value and add a mean zero normal of the appropriate variance X (2m+1)n = 2k? 1 1 2 mn X 2k? 1 + X (2m+2)n 2 2k? 1 + 1 2(k? 2)/2 T Ym,k . 2 The reader should check that the vector of standard normals Y = (Y1,1 , Y2,1 , Y3,1 , Y3,2 , . . . t indeed has n = 2L components. The value of this method for quasi Monte Carlo comes from the fact that the most important values that determine the large scale social structure of X are the ? rst components of Y . As we will see, the components of the Y vectors of quasi Monte Carlo have uneven quality, with the ? rst components being the best. 3. 3 Principle components The rule component eigenvalues and eigenvectors for many types of Brownian motion are known in closed form. In many of these cases, the F ast Fourier Transform (FFT) algorithm leads to a reasonably fast sampling method.These FFT based methods are slower than random walk or Brownian bridge sampling for standard random walk, but they sometimes are the most e? cient for fractional Brownian motion. They may be wear than Brownian bridge sampling with quasi Monte Carlo (Im not sure about this). The eigenvectors of H are known5 to have components (qj,k is the k th component of eigenvector qj . ) qj,k = const sin(? j tk ) . 5 See e. g. Numerical Analysis by Eugene Isaacson and Herbert Keller. 9 (8) The n eigenvectors and eigenvalues then are determined by the allowed values of ? j , which, in turn, are determined throught the boundary conditions.We 2 2 can ? nd ? j in terms of ? j using the eigenvalue equation Hqj = ? j qj evaluated at any of the interior components 1 < k < n 1 2 ? sin(? j (tk ? ?t)) + 2 sin(? j tk ) ? sin(? j (tk + ? t)) = ? j sin(? j tk ) . ?t Doing the math shown that the eigenvalue equation is satis ? ed and that 2 ?j = 2 1 ? cos(? j ? t) . ?t (9) The eigenvalue equation also is satis? ed at k = 1 because the form (8) automatically satis? es the boundary condition qj,0 = 0. This is why we used the sine and not the cosine. Only particular(a) values ? j give qj,k that satisfy the eigenvalue equation at the right boundary point k = n. 10

No comments:

Post a Comment