ELE604/ELE704 Optimization - Hacettepe University, Department of
Transkript
ELE604/ELE704 Optimization - Hacettepe University, Department of
Contents Duality Duality Lagrange Dual Function The Lagrange Dual Problem Dual Problem Weak and Strong Duality Weak Duality Strong Duality Slater's Condition Saddle-point Interpretation Optimality Conditions Certicate of Suboptimality and Stopping Criterion ELE604/ELE704 Optimization Stopping Criterion Constrained Optimization Complementary Slackness KKT Optimality Conditions Solving The Primal Problem via The Dual Perturbation and Sensitivity Analysis Constrained Optimization Algorithms Introduction Primal Methods http://www.ee.hacettepe.edu.tr/∼usezen/ele604/ Feasible Direction Methods Active Set Methods Gradient Projection Method Dr. Umut Sezen & Dr. Cenk Toker Department of Electrical and Electronic Engineering Hacettepe University Equality Constrained Optimization Quadratic Minimization Eliminating Equality Constraints Newton's Method Equality Constraints Newton's Method with Equality Constraint Elimination Penalty and Barrier Methods Penalty Methods Barrier Methods Properties of the Penalty & Barrier Methods Interior-Point Methods Logarithmic Barrier Central Path Dual Points from Central Path KKT Interpretation Newton Step for Modied KKT equations The Interior-Point Algorithm How to start from a feasible point? Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Duality 25-Dec-2014 1 / 118 Duality Duality Duality I Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization I L(x, λ, ν) = f (x) + λT g(x) + ν T h(x) = f (x) + g(x) = g1 (x) h(x) = h1 (x) ··· g2 (x) ··· h2 (x) gi (x) ··· gL (x) T hM (x) ··· hj (x) λ = λ1 ν = ν1 T The domain of the optimization problem D is dened by D = dom f (x) ∩ dom gi (x) ∩ i=1 M \ I νj hj (x) j=1 λ2 ν2 ··· ··· λi νj ··· ··· λL T νM T On the feasible set F where F = {x̃ | x̃ ∈ D ∧ g(x̃) ≤ 0 ∧ h(x̃) = 0} dom hj (x) Lagragian has a value less than or equal to the cost function, i.e, j=1 L(x̃, λ, ν) ≤ f (x̃), Any point x̃ ∈ D satisfying the constraints is called a feasible point, i.e., g(x̃) ≤ 0 and h(x̃)= 0. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Duality M X are called the Lagrange multiplier vectors. represent the L inequality and M equality constraints, respectively. L \ λi gi (x) + where λi ≥ 0 and νj are called the Lagrange multipliers, and λ and ν where g(x) and h(x) L X i=1 h(x) = 0 25-Dec-2014 3 / 118 ∀x̃ ∈ F, ∀λi ≥ 0 . Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Lagrange Dual Function Duality I Proposition: I Lagrange Dual Function Dene the Lagrangian L : RN × RL × RM → R as s.t. g(x) ≤ 0 I 2 / 118 Lagrange Dual Function Consider the standard minimization problem (will be referred as the primal problem) min f (x) I 25-Dec-2014 25-Dec-2014 4 / 118 Lagrange Dual Function The dual function constitutes a lower bound on p∗ = f (x∗ ), i.e., `(λ, ν) ≤ p∗ Then the Lagrange dual function, `(λ, ν), is dened as ∀λ ≥ 0 Let x̃ be a feasible point, that is x̃ ∈ F (i.e., gi (x̃) ≤ 0 and hi (x̃) = 0), and λ > 0, then I Proof: `(λ, ν) = inf L(x, λ, ν) x∈D = inf f (x) + λT g(x) + ν T h(x) λT g(x̃) + ν T h(x̃) ≤ 0. x∈D = inf x∈D I f (x) + L X λi gi (x) + i=1 M X Then, the Lagrangian is ! νj hj (x) L(x̃, λ, ν) = f (x̃) + λT g(x̃) + ν T h(x̃) ≤ f (x̃) | {z } j=1 ≤0 Dual function `(λ, ν) is the pointwise inmum of a set of ane functions of λ and ν , hence it is concave even if f (x), gi (x) and hj (x) are not convex. So, `(λ, ν) = inf L(x, λ, ν) ≤ L(x̃, λ, ν) ≤ f (x̃) x∈D I Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 5 / 118 ∀x̃ The pair (λ, ν) is called dual feasible when λ ≥ 0. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 6 / 118 Duality Lagrange Dual Function Duality Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Duality 25-Dec-2014 7 / 118 I Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Lagrange Dual Function Duality Examples: I Least Squares (LS) solution of linear equations 25-Dec-2014 8 / 118 Lagrange Dual Function Linear Programming (LP) min cT x min xT x s.t. Ax = b s.t. Ax = b - The Lagrangian is Lagrange Dual Function −x≤0 - The Lagrangian is L(x, ν) = xT x + ν T (Ax − b) L(x, λ, ν) = cT x − λT x + ν T (Ax − b) then the Lagrange dual function is then the Lagrange dual function is `(ν) = inf L(x, ν) x Since L(x, ν) is quadratic in x, it is convex `(λ, ν) = −bT ν + inf x 1 ∇L(x, ν) = 2x + AT ν = 0 ⇒ x∗ = − AT ν 2 `(λ, ν) = 1 1 `(ν) = L(− AT ν, ν) = − ν T AT Aν − bT ν 2 4 Duality I 9 / 118 Duality - This is a discrete-problem since xj ∈ {∓1}, and very dicult to solve for large N . The Lagrange dual function is x xT Wx + N X νj x2j − 1 ) 25-Dec-2014 10 / 118 Lagrange Dual Function becomes easy to solve j = 1, . . . , N ( otherwise - If we relax the constraint to be kxk2 = 1, i.e., min xT Wx `(ν) = inf c + AT ν − λ = 0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Lagrange Dual Function Two-way partitioning (a non-convex problem) s.t. x2j = 1, ( −bT ν, −∞, Ane, i.e., concave when c + AT ν − λ = 0 with λ ≥ 0 So, p∗ ≥ −bT ν when c + AT ν − λ = 0 with λ ≥ 0. which is obviously concave p∗ ≥ `(ν) = − 14 ν T AT Aν − bT ν 25-Dec-2014 T c + AT ν − λ x In order `(ν) to be bounded, we must have c + AT ν − λ = 0, i.e., Hence Lagrange dual function is given by Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization N P j=1 x2j = 1, then problem min xT Wx s.t. kxk2 = 1 with an exact solution of p∗ = N λmin (W) where λmin (W) is the minimum eigen value of W. j=1 n o = inf xT (W + diag ν)x − 1T ν x ( −1T ν, W + diag ν 0 = −∞, else We may take ν = −λmin (W)1T which yields p∗ ≥ −1T ν = N λmin (W) where λmin (W) is the minimum eigen value of W. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 11 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 12 / 118 The Lagrange Dual Problem Dual Problem The Lagrange Dual Problem The Lagrange Dual Problem I Dual Problem Linear problem (LP) min cT x s.t. Ax = b x≥0 has the dual function max `(λ, ν) s.t. λ ≥ 0 `(λ, ν) = gives the best lower bound for p∗ , i.e., p∗ ≥ `(λ, ν). I The pairs `(λ, ν) with λ ≥ 0, ν ∈ R I Solution of the above problem for a dual feasible set is called the dual optimal point (λ∗ , ν ∗ ) (i.e., optimal Lagrange multipliers) M ∗ and `(λ, ν) > −∞ are dual feasible. ∗ ∗ p̂ = `(λ , ν ) ≤ p I ( −bT ν, −∞, c + AT ν − λ = 0 otherwise - So, the dual problem can be given by max − bT ν s.t. AT ν − λ + c = 0 ∗ λ≥0 Some hidden (implicit) constraints can be made explicit in the dual problem, e.g. consider the LP problems on the next slides. - The dual problem can be further simplied to the following problem max − bT ν s.t. AT ν + c ≥ 0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization The Lagrange Dual Problem 25-Dec-2014 13 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Dual Problem Weak and Strong Duality 25-Dec-2014 14 / 118 Weak Duality Weak Duality I Linear problem (LP) with inequality min cT x s.t. Ax ≤ b has the dual function I ( `(λ) = −bT λ, −∞, AT λ + c = 0 We know that sup `(λ, ν) = p̂∗ ≤ p∗ = inf f (x) x∈D λ∈R+ ,ν otherwise I - So, the dual problem can be given by I I max − bT λ The inequality means weak duality. Here (p∗ − p̂∗ ) is called the duality gap, which is always nonnegative. Weak duality always holds even when the primal problem is non-convex. s.t. AT λ + c = 0 λ≥0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Weak and Strong Duality 25-Dec-2014 15 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Strong Duality Weak and Strong Duality Strong Duality refers to the case where the duality gap is zero, i.e., I I 25-Dec-2014 18 / 118 Slater's Condition Strong duality holds for a convex problem min f (x) s.t. g(x) ≤ 0 p̂∗ = p∗ I 16 / 118 Slater's Condition I I Strong duality 25-Dec-2014 In general it does not hold. It may hold if the primal problem is convex. The conditions under which strong duality hold are called constraint qualications, where one of them is the Slater's condition. Ax = b if it is strongly feasible, i.e., ∃x ∈ interior D g(x) < 0 Ax = b i.e., inequality constraints hold with strict inequality. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 17 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Weak and Strong Duality Saddle-point Interpretation Weak and Strong Duality Saddle-point Interpretation Saddle-point Interpretation I I p̂∗ ≤ p∗ T sup L(x, λ) = sup f (x) + λ g(x) λ≥0 with weak duality as the inequality λ≥0 = sup f (x) + λ≥0 ( = L X ! sup inf L(x, λ) ≤ inf sup L(x, λ) λi gi (x) λ≥0 x i=1 otherwise sup inf L(x, λ) = inf sup L(x, λ) λ≥0 x Weak and Strong Duality I I If gi (x) ≤ 0, then optimum choice is λi = 0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization x λ≥0 and strong duality as the equality gi (x) ≤ 0, ∀i f (x), ∞, with no equality constraint. I Hence from duality gap, we have First consider the following problem, 25-Dec-2014 19 / 118 x λ≥0 With strong duality we can switch inf and sup for λ ≥ 0. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Saddle-point Interpretation Weak and Strong Duality 25-Dec-2014 20 / 118 Saddle-point Interpretation In general, for any f (w, z) : RN × RL → R sup inf f (w, z) ≤ inf sup f (w, z) z∈Z w∈W w∈W z∈Z with W ⊆ R , Z ⊆ R , and is called the max-min inequality. N L satisfy the strong max-min property or the saddle-point property if the above inequality holds with equality. I f (w, z) I I A point {(w̃, z̃)|w̃ ∈ W, z̃ ∈ Z} is called as the saddle-point for f (w, z) if For Lagrange duality, if x∗ and λ∗ are optimal points for the primal and dual problems with strong duality (zero duality gap), then tyey form a saddle-point for the Lagrangian, or vice-versa (i.e., the converse is also true). f (w̃, z) ≤ f (w̃, z̃) ≤ f (w, z̃) ∀w ∈ W, ∀z ∈ Z I i.e., f (w̃, z̃) = inf f (w, z̃) w∈W = sup f (w̃, z) z∈Z the strong max-min property holds with the value f (w̃, z̃). Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions 25-Dec-2014 21 / 118 Certicate of Suboptimality and Stopping Criterion Certicate of Suboptimality I Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions 25-Dec-2014 22 / 118 Certicate of Suboptimality and Stopping Criterion Stopping Criterion We know that a dual feasible point satises `(λ, ν) ≤ p∗ i.e., the point (λ, ν) is proof or certicate of this condition. Then, I f (x) − p∗ ≤ f (x) − `(λ, ν) If an algorithm produces the sequences x(k) and (λ(k) , ν (k) ) check for ? for primal feasible point x and dual feasible point (λ, ν) whereas the duality gap associated with these points is f (x) − `(λ, ν) f (x(k) ) − `(λ(k) , ν (k) ) < to guarantee -suboptimality. can approach to zero, i.e., → 0, for strong duality. in other words p∗ ∈ [`(λ, ν), f (x)] and p̂∗ ∈ [`(λ, ν), f (x)] - If the duality gap is zero, i.e., f (x) = `(λ, ν) then x∗ = x and (λ∗ , ν ∗ ) = (λ, ν) are the primal and dual optimal points. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 23 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 24 / 118 Optimality Conditions Complementary Slackness Optimality Conditions Complementary Slackness I I Observations: (due to strong duality) 1. Inequality in the third line always holds with equality, i.e., x∗ minimizes L(x, λ∗ , ν ∗ ) Assume that x∗ and (λ∗ , ν ∗ ) satisfy strong duality 2. From the forth line we have f (x∗ ) = `(λ∗ , ν ∗ ) = inf f (x) + λT g(x) + ν T h(x) f (x) + X λi gi (x) + X i ≤ f (x∗ ) + X νj hj (x) I j X λ∗i gi (x∗ ) + i {z } ≤0 | νj∗ hj (x∗ ) {z =0 λ∗i gi (x∗ ) = 0 and since λi gi (x) ≤ 0 ∀i In other words λ∗i > 0 if gi (x∗ ) = 0 λ∗i = 0 if gi (x∗ ) < 0 j | i which is known as complementary slackness. ! x P λ∗i gi (x∗ ) = 0 x = inf Complementary Slackness } i.e., the ith optimal lagrange multiplier is - positive if gi (x∗ ) is active at x∗ . - zero if gi (x∗ ) is not active at x∗ . ≤ f (x∗ ) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions 25-Dec-2014 25 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization KKT Optimality Conditions Optimality Conditions 25-Dec-2014 26 / 118 KKT Optimality Conditions KKT Optimality Conditions I I x∗ minimizes L(x, λ∗ , ν ∗ ), thus ∇L(x, λ∗ , ν ∗ ) = 0, i.e., ∇f (x∗ ) + X λ∗i ∇gi (x∗ ) + X i I νj∗ ∇hj (x∗ ) = 0 For any optimization problem with dierentiable objective (cost) and constraint functions for which strong duality holds, any pair of primal and optimal points satisfy KKT conditions. I For convex problems: j Then the Karush-Kuchn-Tucker (KKT) conditions for x∗ and (λ∗ , ν ∗ ) being primal and dual optimal points with zero duality gap (i.e., with strong duality) are (constraint) (constraint) λ∗i ≥ 0 (constraint) λ∗i gi (x∗ ) = 0 (complementary slackness) gi (x∗ ) ≤ 0 If f (x), gi (x) and hj (x) are convex and x̃, λ̃ and ν̃ satisfy KKT conditions, then they are optimal points, i.e., - from complementary slackness f (x̃) = L(x̃, λ̃, ν̃) ∗ ∇f (x ) + X λ∗i ∇gi (x∗ ) + i X νj∗ ∇hj (x∗ ) - from last condition `(λ̃, ν̃) = L(x̃, λ̃, ν̃) = inf L(x, λ̃, ν̃) . Note that x L(x, λ̃, ν̃) is convex in x. hj (x∗ ) = 0 Thus, f (x̃) = `(λ̃, ν̃). =0 j Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions 25-Dec-2014 27 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization KKT Optimality Conditions Optimality Conditions 25-Dec-2014 28 / 118 25-Dec-2014 30 / 118 KKT Optimality Conditions Example 19: Example 18: 1 T x Qx + cT x + r 2 s.t. Ax = b min Soln: (Q : SPD) Solution: From KKT consitions Ax∗ = b ∗ T ∗ Qx + c + A ν = 0 ) = Q A AT 0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization x∗ −c = ν∗ b 25-Dec-2014 29 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions KKT Optimality Conditions Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions Optimality Conditions 25-Dec-2014 31 / 118 Optimality Conditions I Example 20: I If strong duality holds and if dual optimal solution (λ∗ , ν ∗ ) exists, then we can compute a primal optimal solution from the dual solutions. I When strong duality holds and (λ∗ , ν ∗ ) is given, the minimizer of L(x, λ∗ , ν ∗ ), i.e., X X λ∗i gi (x) + min f (x) + i νj∗ hj (x) j is unique and if it is primal feasible, then it is also primal optimal. I Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Solving The Primal Problem via The Dual Solving The Primal Problem via The Dual If the dual problem is easier to solve (e.g. has less dimensions or has an analytical solution), then solving the dual problem and nding the optimal dual parameters (λ∗ , ν ∗ ) rst and then solving 25-Dec-2014 32 / 118 Solving The Primal Problem via The Dual Consider the following problem min f (x) = N X fi (xi ) i=1 s.t. aT x = b where each fi (x) is strictly convex and dierentiable, a ∈ RN and b ∈ R. Assume that the problem has a unique solution (non-empty and non-innity) and it is dual feasible. - f (x) is separable because fi (xi ) is a function of xi only. Now the Lagrangian will be given as L(x, ν) = N X fi (xi ) + ν aT x − b i=1 x∗ = argmin L(x, λ∗ , ν ∗ ) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 N X = −bν + will be an acceptable method to solve constrained minimization problems. Optimality Conditions KKT Optimality Conditions (fi (xi ) + νai xi ) i=1 33 / 118 Solving The Primal Problem via The Dual Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions 25-Dec-2014 34 / 118 Solving The Primal Problem via The Dual Then the dual function is given by `(ν) = inf L(x, ν) x = −bν + inf N X x = −bν + N X i=1 Then the dual problem is a function of a scalar ν ∈ R (fi (xi ) + νai xi ) i=1 max −bν − ν inf (fi (xi ) + νai xi ) xi | {z } N X ! fi∗ (−νai ) i=1 - Once we nd ν ∗ , we know that L(x, ν ∗ ) is strictly convex as each fi (x) is strictly convex. So, we can nd x∗ by solving fi∗ (−νai ) = −bν − N X fi∗ (−νai ) ∇L(x, ν ∗ ) = 0 i=1 ∂fi (xi ) = −ν ∗ ai . ∂xi where fi∗ (y) is the conjugate function of fi (x). NOTE: The conjugate function f ∗ (y) is the maximum gap between the linear function yT x and f (x) (see Boyd Section 3.3). If f (x) is dierentiable, this occurs at a point x where ∇f (x) = y. Note that, f ∗ (y) is a convex function. f ∗ (y) = sup yT x − f (x) x∈dom f (x) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 35 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 36 / 118 Optimality Conditions Perturbation and Sensitivity Analysis Optimality Conditions Perturbation and Sensitivity Analysis Perturbation and Sensitivity Analysis I Original Problem dual primal min f (x) Here u = [ui ]L×1 and v = [vj ]M ×1 are called the perturbations. When ui = 0 and vj = 0, the problem becomes the original problem. If ui > 0, it means we have relaxed the i-th inequality constraint, and if ui < 0, it means we have tightened the i-th inequality constraint. I Let us use the notation p∗ (u, v) to denote the optimal solution of the perturbed problem. Thus, the optimal solution of the original problem is p∗ = p∗ (0, 0) = `(λ∗ , ν ∗ ). I Assume that strong duality holds and optimal dual solution exists, i.e., p∗ = `(λ∗ , ν ∗ ). Then, we can show that max `(λ, ν) s.t. g(x) ≤ 0 s.t. λ ≥ 0 h(x) = 0 I I Perturbed problem primal dual p∗ (u, v) ≥ p∗ (0, 0) − λ∗T u − ν ∗T v max `(λ, ν) − λT u − ν T v min f (x) s.t. g(x) ≤ u s.t. λ ≥ 0 h(x) = v Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Optimality Conditions 25-Dec-2014 37 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Perturbation and Sensitivity Analysis Optimality Conditions I I If λi is large and ui < 0, then p∗ (u, v) is guaranteed to increase greatly. I If λi is small and ui > 0, then p∗ (u, v) will not decrease too much. I If |νj | is large - If νj > 0 and vi < 0, then p∗ (u, v) is guaranteed to increase greatly. - If νj < 0 and vi > 0, then p∗ (u, v) is guaranteed to increase greatly. I If |νj | is small - If νj > 0 and vi > 0, then p∗ (u, v) will not decrease too much. - If νj < 0 and vi < 0, then p∗ (u, v) will not decrease too much. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 39 / 118 Introduction 25-Dec-2014 38 / 118 Perturbation and Sensitivity Analysis The perturbation inequality, p∗ (u, v) ≥ p∗ (0, 0) − λ∗T u − ν ∗T v, give a lower bound on the perturbed optimal value, but no upper bound. For this reason the results are not symmetric respect to loosening or tightening a constraint, For example, if λi is large and we loosen the i-th constraint a bit (i.e., take ui small and positive, 0 < ui < ), then perturbation inequality is not useful, it does not imply that optimal value will decrease considerably. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 40 / 118 25-Dec-2014 42 / 118 Introduction Introduction I The general constrained optimization problem min f (x) s.t. g(x) ≤ 0 h(x) = 0 Given a feasible point x(k) , if gi (x) ≤ 0 is satised with equality, i.e., gi (x) = 0 , then constraint i is said to be active at x(k) . Otherwise it is inactive at x(k) . I Defn (Active Constraint): Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 41 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms Primal Methods Constrained Optimization Algorithms Primal Methods Primal Methods Feasible Direction Methods See Luenberger Chapter 12. I A primal method is a search method that works on the original problem directly by searching through the feasible region for the optimal solution. Each point in the process is feasible and the value of the objective function constantly decreases. For a problem with N variables and M equality constraints, primal methods work in the feasible space of dimension N − M . I Update equation is x(k+1) = x(k) + α(k) d(k) d(k) must be a descent direction and x(k) + α(k) d(k) must be contained in the feasible region, i.e., d(k) must be a feasible direction for some α(k) > 0. Advantages: - x are all composed of feasible points. - If x(k) is a convergent sequence, it converges at least to a local minimum. - Do not rely on special problem structure, e.g. convexity, in other words, primal methods are applicable to general non-linear problems. (k) Very similar to unconstrained descent methods but now line search is constrained so that x(k+1) is also a feasible point. Disadvantages: - must start from a feasible initial point. - may fail to converge for inequality constraints if precaution is not taken. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms I Example 21: 25-Dec-2014 43 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Primal Methods Constrained Optimization Algorithms 25-Dec-2014 44 / 118 Primal Methods Consider the following problem min f (x) s.t. aTi x ≤ bi , - Let A (k) i = 1, . . . , M be the set of indices representing active constraints, i.e., aTi x(k+1) = bi , i ∈ A(k) at x(k) , then the direction vector d(k) is calculated by min ∇T f (x(k) )d d s.t. aTi d ≤ 0, N X i ∈ A(k) |di | = 1 There are two major shortcomings of feasible direction methods that require that they be modied in most cases. I The rst shortcoming is that for general problems there may not exist any feasible directions. If, for example, a problem had nonlinear equality constraints, we might nd ourselves in the situation where no straight line from x(k) has a feasible segment. For such problems it is necessary either to relax our requirement of feasibility by allowing points to deviate slightly from the constraint surface or to introduce the concept of moving along curves rather than straight lines. I i=1 Last line ensures a bounded solution. The other constraints assure that vectors of the form x(k) + α(k) d(k) will be feasible for suciently small α(k) > 0, and subject to these conditions, d(k) is chosen to line up as closely as possible with the negative gradient of f (x(k) ). In some sense this will result in the locally best direction in which to proceed. The overall procedure progresses by generating feasible directions in this manner, and moving along them to decrease the objective. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 45 / 118 A second shortcoming is that in simplest form most feasible direction methods are not globally convergent. They are subject to jamming (sometimes referred to as zigzagging) where the sequence of points generated by the process converges to a point that is not even a constrained local minimum point. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Primal Methods Constrained Optimization Algorithms 25-Dec-2014 46 / 118 Primal Methods Active Set Methods I I The idea underlying active set methods is to partition inequality constraints into two groups: those that are to be treated as active and those that are to be treated as inactive. The constraints treated as inactive are essentially ignored. I Let A be the set of indices of active constraints (i.e., gi (x∗ ) = 0, i ∈ A). Then ∇f (x∗ ) + Consider the following problem X λi ∇gi (x∗ ) = 0 i∈A min f (x) gi (x∗ ) = 0, i∈A s.t. g(x) ≤ 0 gi (x∗ ) < 0, i∈ /A λi ≥ 0, i∈A λi = 0, i∈ /A For simplicity, no equality constraints, the inclusion of equality constraints will be straightforward. Inactive constraints are inhibited (i.e., λi = 0). Necessary conditions for optimum x are ∗ ∇f (x ) + ∇ λT g(x∗ ) = 0 ∗ I g(x∗ ) ≤ 0 Just taking active constraints in A, problem is converted to an equality-constrained-only problem. λ g(x∗ ) = 0 T λ≥0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 47 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 48 / 118 Constrained Optimization Algorithms Primal Methods Constrained Optimization Algorithms Primal Methods I Active set method: - at each step nd the "working set" and treat it as the active set (can be a subset of the actual active set). - move to a lower point on the "surface" of the working set - nd a new "working set" for this point - repeat I Active set method: - at each step nd the working set W and treat it as the active set (can be a subset of the actual active set, i.e., W ⊆ A ). - move to a lower point on the surface of the working set - nd a new working set W for this point - repeat I The surface dened by the working set will be called the working surface. I The surface dened by the working set W will be called the working surface. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 49 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Primal Methods Constrained Optimization Algorithms I I Given any working set W ⊆ A, Assume that xW is a solution to the problem PW 25-Dec-2014 50 / 118 Primal Methods If ∃i ∈ W such that λi < 0, then dropping constraint i (but staying feasible) will decrease the value due to the sensitivity theorem (relaxing the constraint i to be gi (x) = −c reduces f (x) by λi c). min f (x) s.t. gi (x) = 0, i∈W also satisfying gi (xW ) < 0, i ∈/ A. If xW cannot be found change the working set W until a solution xW is obtained. I Once xW is found, then solve ∇f (xW ) + X λi ∇gi (xW ) = 0 i∈W to nd λi . I If λi ≥ 0, ∀i ∈ W, then xW is a local optimal solution of the original problem. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms I I 25-Dec-2014 51 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Primal Methods Constrained Optimization Algorithms The surface dened by a working set is called a working surface. By dropping i from W and moving on the new working surface (toward the interior of the feasible region F, we move to an improved solution. ) Monitoring the movement to avoid infeasibility until one or more constraints become active, then add them to the working set W. Now, solve the changed problem PW again to nd the a new xW and repeat the previous steps. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 53 / 118 I 25-Dec-2014 52 / 118 Primal Methods If we can assure the the objective function (cost function) value is monotonically decreasing, then any working set will not appear twice in the process. Hence the active set method terminates in a nite number of iterations.) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 54 / 118 Constrained Optimization Algorithms Active Sets Algorithm: Primal Methods Constrained Optimization Algorithms I I Disadvantage: The inner loop must terminate at a global optimum in order to I Discuss: How can we integarte equality constraints to the original problem? For a given working set W repeat repeat - minimize f (x) over the working set W using x(k+1) = x(k) + α(k) d(k) - check whether a new constraint becomes active. if so, add the new constraint to the working set W. until some stopping criterion is satised check the Lagrange multipliers λi - drop constraints with λi < 0 from the working set W until some stopping criterion is satised Primal Methods determine correct λi , and the same working set is not encountered in the following iterations. Note that f (x) strictly decreases at each step. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 55 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Primal Methods Constrained Optimization Algorithms 25-Dec-2014 56 / 118 Primal Methods Gradient Projection Method I Gradient Projection Method is an extension of the Gradient (or Steepest) Descent I Let us rst consider linear constraints Method (GD or SD) to the constrained case. I Another perspective is to let the equality Ax = b to represent all the active constraints aTi x = bi , i ∈ W. Thus min f (x) s.t. Ax = b min f (x) s.t. aTi x ≤ bi , i ∈ I1 aTi x = bi , i ∈ I2 I If we use rst order Taylor approximation around point x(k) , such that f (x) = f (x(k) + d) ∼ = f (x(k) ) + ∇T f (x(k) )d for small enough d , the we will have min f (x(k) ) + ∇T f (x(k) )d s.t. A x(k) + d = b Let us take the active constraints,i.e., aTi x = bi , as the working set W and seek a feasible descent direction nd ∇T f (x)d < 0 while satisfying aTi d = 0, i ∈ W i.e., d must lie in the tangent plane dened by aTi d dT Id ≤ 1 I As Ax(k) = b and Ax(k+1) = b, so this implies that Ad = 0. = 0, i ∈ W. Hence, the above problem is the projection of −∇f (x) on to this tangent plane. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 57 / 118 Primal Methods Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 58 / 118 Primal Methods Projected Steepest Descent Algorithm (PSDA): I 1. For a feasible initial point x(0) (i.e., x(0) ∈ F) Thus, the problem simplies to 2. Solve the Direction Finding Problem (DFP) min ∇T f (x(k) )d d(k) = argmin ∇T f (x(k) )d s.t. Ad = 0 T s.t. Ad = 0 d Id ≤ 1 dT Qd ≤ 1, - Ad = 0 denes the tangent plane M ⊆ RN and ensures that x(k+1) is still feasible. - dT Id ≤ 1 is the Euclidean unit ball (kdk22 ≤ 1), thus this is the projected GD algorithm. - We may also use the constraint d Qd ≤ 1, where Q is a SPD matrix, to obtain the projected SD algorithm. T (Q : SPD) 3. If ∇T f (x(k) )d(k) = 0, stop. x(k) is a KKT point. 4. Solve α(k) = argmin f (x(k) + αd(k) ) using a line search algorithm, e.g. exact or backtracking line search. 5. x(k+1) = x(k) + α(k) d(k) 6. Goto Step 1 with x(0) = x(k+1) . Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 59 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 60 / 118 Constrained Optimization Algorithms Primal Methods Constrained Optimization Algorithms Primal Methods Projection: I DFP problem is another constrained optimization problem which should satisfy its KKT conditions, i.e., Thus, d̃(k) is obtained as −1 d̃(k) = − Q−1 − Q−1 AT AQ−1 AT AQ−1 ∇f (x(k) ) Ad(k) = 0 T = −P(k) ∇f (x(k) ) d(k) Qd(k) = 0 ∇f (x (k) ) + 2βk Qd (k) T where + A λk = 0 βk ≥ 0 (k) T (k) Qd =0 βk 1 − d −1 P(k) = Q−1 − Q−1 AT AQ−1 AT AQ−1 is called the projection matrix. I - Here, Ad(k) = 0 denes the tangent plane M(k) ⊆ RN . Note that if Q = I, then −1 P(k) = I − AT AAT A If we set d̃(k) = 2βk d(k) , From the third condition we nd that d̃(k) = −Q−1 ∇f (x(k) ) − Q−1 AT λk I Now if put this value into the rst condition we obtain As 2βk is just a scaling factor, we can safely write that the descent direction d(k) is given by d(k) = −P(k) ∇f (x(k) ) −1 λk = − AQ−1 AT AQ−1 ∇f (x(k) ) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 61 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Primal Methods Constrained Optimization Algorithms 4. If d(k) = 0 a) Find λ Given a feasible point x(k) (i.e., x(k) ∈ F) 1. Find the active constraint set W(k) and form A (actually A(k) ) matrix 2. Calculate PSDA with DFP Algorithm: 25-Dec-2014 62 / 118 Primal Methods −1 λ = − AQ−1 AT AQ−1 ∇f (x(k) ) b) If λi ≥ 0, ∀i, stop. x(k) satises KKT. c) If ∃λi < 0, delete the row from A corresponding to the inequality with the most negative component of λi and drop its index from W. Goto Step 2. −1 P(k) = Q−1 − Q−1 AT AQ−1 AT AQ−1 d(k) = −P(k) ∇f (x(k) ) 3. If d(k) 6= 0 a) Find α n o α1 = max α : x(k) + αd(k) is feasible n o α2 = argmin f (x + αd(k) ) : 0 ≤ α ≤ α1 b) x(k+1) = x(k) + α2 d(k) c) Goto Step 1 (with x(k) = x(k+1) ) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms I Example 22: 25-Dec-2014 63 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Primal Methods Constrained Optimization Algorithms Consider the following problem x1 + x2 + 2x3 + x4 = 6 i = 1, 2, 3, 4 Given the initial point x(0) = 2 2 1 0 T , nd the initial direction d(0) for the projected gradient descent algorithm (PGDA). - The active constraints are the two equalities and the inequality x4 ≥ 0, thus 2 A = 1 0 1 1 0 Primal Methods 22 9 AAT = 9 7 4 1 6 −1 1 −5 AAT = 11 −19 s.t. 2x1 + x2 + x3 + 4x4 = 7 64 / 118 So, min x21 + x22 + x23 + x24 − 2x1 − 3x4 xi ≥ 0, 25-Dec-2014 1 2 0 4 1 1 4 1 1 −19 14 73 −5 6 14 Hence, the projection matrix P(0) is given by P (0) T =I−A T AA −1 1 1 −3 A= 11 1 0 −3 9 −3 0 1 −3 1 0 0 0 0 0 Finally, the driection d(0) is given by Also, ∇f (x(0) ) is given by ∇f (x (0) 2 4 )= 2 −3 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization d 25-Dec-2014 65 / 118 (0) = −P (0) ∇f (x (0) 1 )= 11 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization −8 24 . −8 0 25-Dec-2014 66 / 118 Constrained Optimization Algorithms Primal Methods Constrained Optimization Algorithms Primal Methods Nonlinear constraints: I Consider the problem which denes only the active constraints min f (x) s.t. h(x) = 0 I I In linear constraints x lie on the tangent plane M. However in nonlinear constraints, the surface dened by the constraints and the tangent plane M touch in a single point (k) In this case updated point must be projected onto the constrained surface. For projected gradient descent method, the projection matrix is given by −1 P(k) = I − JTh (x(k) ) Jh (x(k) )JTh (x(k) ) Jh (x(k) ) where Jh (x) is the Jacobian of h(x), i.e., T Jh (x) = ∇hT (x) . Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 67 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Equality Constrained Optimization Constrained Optimization Algorithms Equality Constrained Optimization 25-Dec-2014 68 / 118 Equality Constrained Optimization Quadratic Minimization See Boyd Chapter 10. 1 T x Qx + cT x + r 2 s.t. Ax = b min min f (x) s.t. Ax = b where Q ∈ RN ×N is a symmetric positive semidene matrix (SPSD), A ∈ RM ×N and b ∈ RM . where f (x) : RN → R, convex, twice dierentiable, A ∈ RM ×N , M < N . Minimum occurs at I Using KKT conditions p∗ = inf {f (x) | Ax = b} = f (x∗ ) Ax∗ = b where x∗ satises KKT conditions ∗ T ∗ Qx + c + A ν = 0 ) = | T I ∗ ∇f (x ) + A ν = 0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 69 / 118 Constrained Optimization Algorithms I I One general approach to solving the equality constrained problem is to eliminate the equality constraints and then solve the resulting unconstrained problem using methods for unconstrained minimization. I We rst nd a matrix F ∈ RN ×(N −M ) and vector x̂ that parametrize the (ane) feasible set: n o {x | Ax = b} = Fz + x̂ | z ∈ RN −M . We then form the reduced or eliminated optimization problem 25-Dec-2014 70 / 118 Equality Constrained Optimization There are, of course, many possible choices for the elimination matrix F, which can be chosen as any matrix in RN ×(N −M ) with the range (columnspace) of the matrix F is equals to the nullspace of the matrix A, i.e., R(F) = N (A). If F is one such matrix, and T ∈ R(M −N )×(M −N ) is nonsingular, then F̃ = FT is also a suitable elimination matrix, since R(F̃) = R(F) = N (A). I Here x̂ can be chosen as any particular solution of Ax = b, and F ∈ RN ×(N −M ) is any matrix whose range (columnspace) is the nullspace of A, i.e., R(F) = N (A). Conversely, if F and F̃ are any two suitable elimination matrices, then there is some nonsingular T such that F̃ = FT. If we eliminate the equality constraints using F, we solve the unconstrained problem min f (Fz + x̂) while if F̃ is used, we solve the unconstrained problem min f˜(z) = f (Fz + x̂) which is an unconstrained problem with variable z ∈ R . From its solution z , we can nd the solution of the equality constrained problem as N −M ∗ x∗ = Fz∗ + x̂ . Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Above equations denes a KKT system for an equality constrained quadratic optimization problem with (N + M ) linear equations in the (N + M ) variables (x∗ , ν ∗ ). Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Equality Constrained Optimization Eliminating Equality Constraints I −c AT x∗ = ν∗ b 0 {z } KKT matrix Ax = b ∗ Q A 25-Dec-2014 71 / 118 min f (F̃z̃ + x̂) = f (F(Tz̃) + x̂) This problem is equivalent to the one above, and is simply obtained by the change of coordinates z = Tz̃. In other words, changing the elimination matrix can be thought of as changing variables in the reduced problem. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 72 / 118 Constrained Optimization Algorithms Equality Constrained Optimization Constrained Optimization Algorithms Example 23: Equality Constrained Optimization Newton's Method Equality Constraints I It is the same as unconstrained Newton's Method except - initial point is feasible, x(0) ∈ F, i.e., Ax(0) = b. - Newton step ∆xnt is a feasible direction, i.e., A∆xnt = 0. I In order to use Newton's Method on the problem min f (x) s.t. Ax = b we can use second-order Taylor approximation around x (actually x(k) ) to obtain a quadratic minimization problem min f (x + ∆x) = f (x) + ∇T f (x)∆x + 1 ∆xT H(x)∆x 2 s.t. A (x + ∆x) = b This problem is convex if H(x) 0 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 73 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Equality Constrained Optimization Constrained Optimization Algorithms I I Newton decrement λ(x) is the same as the one used for the unconstrained problem, i.e., 1/2 = k∆xnt kH(x) being the norm of the Newton step in the norm dened by H(x). −∇f (x) H(x) AT ∆xnt = . ∗ ν 0 A 0 | {z } Solution exists when the KKT matrix is non-singular. λ2 (x) 2 I , being a good estimate of f (x) − p∗ (i.e., f (x) − p∗ ≈ 2 as the stopping criterion λ 2(x) ≤ . I Appears in the line search KKT matrix I 74 / 118 Equality Constrained Optimization λ(x) = ∆xTnt H(x)∆xnt Using the results of quadratic minimization with equality constraints 25-Dec-2014 The same solution can also be obtained by setting x∗ = x + ∆xnt in the optimality condition equations of the original problem d f (x + α∆xnt ) dα Ax∗ = b I ∗ ∇f (x ) + AT ν ∗ = 0 λ2 (x) 2 ), can be used = ∇T f (x)∆xnt = −λ2 (x) α=0 In order to Newton step ∆xnt to be a feasible descent direction the following two conditions need to be satised as ∆xnt and ν ∗ should satisfy the optimality conditions. ∇T f (x)∆xnt = −λ2 (x) A∆xnt = 0 I Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 75 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Equality Constrained Optimization Constrained Optimization Algorithms Newton's Method with Equality Constraint Elimination I 25-Dec-2014 76 / 118 Equality Constrained Optimization It can be shown that ∆xnt = F∆znt min f (x) s.t. Ax = b ≡ where ∆xnt is the Newton step of the original problem with equality constraints. min f˜(z) = f (Fz + x̂) with R(F) = N (A) and Ax̂ = b. I The solution for equality constrained Newton's Method is invariant to ane transformations. I The gradient and Hessian of f˜(z) are λ̃2 (z) = ∆zTnt H̃(z)∆znt ∇f˜(z) = FT ∇f (Fz + x̂) = ∆zTnt FT H(x)F∆znt H̃(z) = FT H(Fz + x̂)F I Then, the KKT matrix is invertible, i H̃(z) is invertible. I The Newton step of the reduced problem is The Newton decrement λ̃(z) is the same as the Newton decrement of the original problem = ∆xTnt H(x)∆xnt = λ2 (x) −1 ∆znt = −H̃−1 (z)∇f˜(z) = − FT H(x)F FT ∇f (x) where x = Fz + x̂. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 77 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 78 / 118 Constrained Optimization Algorithms Penalty and Barrier Methods Constrained Optimization Algorithms Penalty and Barrier Methods Penalty and Barrier Methods Penalty Methods min f (x) s.t. x ∈ F See Luenberger Chapter 13. I I They approximate constrained problems by unconstrained problems I The approximation is obtained by adding - a term with a high cost for violating the constraints (penalty methods) - a term that favors points interior to the feasible points over those near the boundary (barrier methods) to the objective function (cost function). I The penalty and barrier methods work directly in original the N -dimensional space RN rather than the (N − M )-dimensional space RN −M as in the case of the primal methods. I I The idea is to replace the this problem by the following penalty problem min f (x) + cP (x) where c ∈ R++ is a constant and P (x) : RN → R is the penalty function. Here - P (x) is continuous - P (x) ≥ 0 ∀x ∈ RN - P (x) = 0 i x ∈ F In general, the following quadratic penalty function is used P (x) = L M 1X 1X (max {0, gi (x)})2 + (hj (x))2 2 i=1 2 j=1 where gi (x) ≤ 0 are the inequality constraints and hj (x) = 0 are the equality constraints. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms I For example, consider P (x) = 1 2 2 P i=1 25-Dec-2014 79 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Penalty and Barrier Methods Constrained Optimization Algorithms 25-Dec-2014 80 / 118 Penalty and Barrier Methods (max {0, gi (x)})2 with g1 (x) = x − b and g2 (x) = a − x Penalty Method: n o I Let c(k) , k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and c(k+1) > c(k) . I Let q(c, x) = f (x) + cP (x) and for each k solve the penalty problem min q(c(k) , x) obtaining a solution point x(k) . I For large c, the minimum point of the penalty problem will be in a region where P (x) is small. I For increasing c, the solution will approach the feasible region F and as c → ∞ the penalty problem will converge to a solution of the constrained problem. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 81 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Penalty and Barrier Methods Constrained Optimization Algorithms 25-Dec-2014 82 / 118 Penalty and Barrier Methods Barrier Methods Also known as interior methods. min f (x) I Lemma: i. ii. iii. iv. As k → ∞, c (k+1) >c (k) s.t. x ∈ F (e.g., start from an exterior point) I q(c(k) , x(k) ) ≤ q(c(k+1) , x(k+1) ) P (x(k) ) ≥ P (x(k+1) ) The idea is to replace the this problem by the following barrier problem min f (x) + f (x(k) ) ≤ f (x(k+1) ) s.t. x ∈ interior of F f (x∗ ) ≥ q(c(k) , x) ≥ f (x(k) ) n o I Theorem: Let x(k) , k = 1, 2, . . . be a sequence generated by the penalty method. Then, any limit of the sequence is a solution of the original constrained problem. I I where c ∈ R++ is a constant and B(x) : RN → R is the barrier function. Here, barrier function B(x) is dened on the interior of F such that - B(x) is continuous - B(x) ≥ 0 - B(x) → ∞ as x approaches the boundary of F Ideally, ( B(x) = Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 1 B(x) c 83 / 118 0, ∞, x ∈ interior F x∈ / interior F Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 84 / 118 Constrained Optimization Algorithms Penalty and Barrier Methods Constrained Optimization Algorithms I I Penalty and Barrier Methods For example, consider B(x) = −1/g1 (x) − 1/g2 (x), g1 (x) = x − b and g2 (x) = a − x. There are several approximations. Two common barrier functions for the inequality constraints are given below Log barrier: B(x) = − L X log (−gi (x)) i=1 Inverse barrier: B(x) = − L X i=1 1 gi (x) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 85 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Penalty and Barrier Methods Constrained Optimization Algorithms 25-Dec-2014 86 / 118 Penalty and Barrier Methods Barrier Method: I I n o Let c(k) , k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and c(k+1) > c(k) . Let r(c, x) = f (x) + and for each k solve the barrier problem I Lemma: i. ii. iii. iv. 1 B(x) c As k → ∞, c(k+1) > c(k) (start from an interior point) r(c(k) , x(k) ) ≥ r(c(k+1) , x(k+1) ) B(x(k) ) ≤ B(x(k+1) ) f (x(k) ) ≥ f (x(k+1) ) f (x∗ ) ≤ f (x(k) ) ≤ r(c(k) , x) n o Any limit point of a sequence x(k) , k = 1, 2, . . . generated by the barrier method is a solution of the original constrained problem. min r(c(k) , x) I Theorem: s.t. x ∈ interior of F obtaining a solution point x(k) . Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 87 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Penalty and Barrier Methods Constrained Optimization Algorithms Properties of the Penalty & Barrier Methods I I Let pi (x) = max {0, gi (x)} and p(x) = pi where F(x), G(x) and Γ(x) are Hessians of f (x), g(x) and γ(x) respectively, and Jp (x) is the Jacobian of p(x). L×1 P (x) = pT (x)p(x) = L X I If at x∗ there are r active constraints, then the Hessian matrices Q(c(k) , x(k) ) have r eigenvalues tending to ∞ as c(k) → ∞ and (n − r) eigenvalues tending to some nite value. In other words, condition number goes to innity (κ → ∞) as c(k) → ∞. I Gradient Descent may not be directly applied, instead Newton's Method is preferred! I Same observation also applies to the barrier method. (pi (x))2 i=1 or more generally to be the quadratic norm function γ(y) = yT Γ y I ⇒ Dening Q(c, x) = F(x) + c ∇T γ(p(x)) G(x) + c JTp (x) Γ(p(x)) Jp (x) Let the penalty function γ(x) where P (x) = γ(p(x)) to be the Euclidean norm function ⇒ Penalty and Barrier Methods The Hessian Q(c, x) is given by γ(y) = yT y 88 / 118 q(c, x) = f (x) + c γ(p(x)) The penalty method solves the unconstrained problem min f (x) + cP (x) I 25-Dec-2014 P (x) = pT (x) Γ p(x) The Hessian of the above problem becomes more and more ill-conditioned as c→∞ Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 89 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 90 / 118 Constrained Optimization Algorithms Interior-Point Methods Constrained Optimization Algorithms Interior-Point Methods See Boyd Chapter 11. I The function ! L X φ(x) = − 1 − log (−gi (x)) min f (x) + c i=1 Interior-Point Methods L X log (−gi (x)) i=1 with dom φ(x) = x ∈ RN | gi (x) < 0, ∀i is called the logarithmic barrier s.t. Ax = b function. I We will modify the Newton's algorithm to solve the above problem. So, we will need ∇φ(x) = − L X i=1 Hφ (x) = L X 1 ∇gi (x) gi (x) 1 g 2 (x) i=1 i ∇φ(x)∇T φ(x) − L X i=1 1 Hg (x) gi (x) i where Hφ (x) = ∇2 φ(x) and Hgi (x) = ∇2 gi (x) are the Hessians of φ(x) and gi (x) respectively. Minimum occurs at ∗ ∗ p = inf {f (x) |Optimization Ax = b} = f (x ) Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Constrained Optimization Algorithms 25-Dec-2014 91 / 118 Interior-Point Methods Constrained Optimization Algorithms I Example 24:Inequality Central Path I Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization s.t. Ax ≤ b (where e ∈ R , A ∈ R s.t. Ax = b with dom φ(x) = x ∈ RN | gi (x) < 0, ∀i is called the logarithmic barrier N function. I φ(x) = − The point x (c) is the solution. The trajectory of x (c) as a function of c is called the central path. Points on the central path satisfy ∗ Ax∗ (c) = b gi (x∗ (c)) < 0, c ∇f (x∗ (c)) + ∇φ(x∗ (c)) + AT ν̂ = 0 ∀i Centrality conditions and b ∈ RL are constants) is given by log bi − aTi x , dom φ(x) = {x | Ax < b} i=1 where aT1 , . . . , aTL are the rows of A. The gradient and Hessian of the barrier function are ∇φ(x) = L X i=1 1 ai bi − aTi x = AT d Last line could be rewritten as c ∇f (x∗ (c)) − N ×L L X for some ν̂ ∈ RM . I Interior-Point Methods min eT x min cf (x) + φ(x) ∗ 92 / 118 form linear programming. The logarithmic barrier function for an LP in inequality form Consider the equivalent problem (c > 0) 25-Dec-2014 Hφ (x) = L X i=1 L X i=1 1 ∇gi (x∗ (c)) + AT ν̂ = 0 gi (x∗ (c)) = AT (diag d)2 A where di = Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 93 / 118 Interior-Point Methods 1 . bi − aTi x Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms Since x is strictly feasible, we have d > 0, so the Hessian of φ(x), Hφ (x) is nonsingular if and only if A has rank N , i.e. full-rank. The centrality condition is c e + AT d = 0 We can give a simple geometric interpretation of the centrality condition. At a point x∗ (c) on the central path the gradient ∇φ(x∗ (c)), which is normal to the level set of φ(x) through x∗ (c), must be parallel to e. In other words, the hyperplane eT x = eT x∗ (c) is tangent to the level set of φ(x) through x∗ (c). Figure below shows an example with L = 6 and N = 2. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 1 T 2 ai ai (bi − aTi x) 95 / 118 25-Dec-2014 94 / 118 Interior-Point Methods The dashed curves in the previous gure show three contour lines of the logarithmic barrier function φ(x). The central path converges to the optimal point x∗ as c → ∞. Also shown is the point on the central path with c = 10. The optimality condition at this point can be veried geometrically: The line eT x = eT x∗ (10) is tangent to the contour line of φ(x) through x∗ (10). Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 96 / 118 Constrained Optimization Algorithms Interior-Point Methods Constrained Optimization Algorithms Dual Points from Central Path I So, the dual function optimal value p̂∗ (c) = `(λ∗ (c), ν ∗ (c)) is nite and given as `(λ∗ (c), ν ∗ (c)) = f (x∗ (c)) + From the centrality condition, let 1 , c gi (x∗ (c)) ∀i =1 c =0 L = f (x (c)) − c I ∇f (x∗ (c)) + λ∗i (c)gi (x∗ (c)) +ν ∗T (c) (Ax∗ (c) − b) | {z } {z } | ∗ ν̂ c ν ∗ (c) = L X L X i=1 λ∗i (c) = − then Interior-Point Methods Duality gap is L and c f (x∗ (c)) − p∗ ≤ λ∗i (c)∇gi (x∗ (c)) + AT ν ∗ (c) = 0 goes to zero, as c → ∞. i=1 L c Hence, from KKT, x (c) minimizes ∗ L(x, λ, ν) = f (x) + λT g(x) + ν T (Ax − b) for λ = λ∗ (c) and ν = ν ∗ (c) for a particular c, which means that (λ∗ (c), ν ∗ (c)) is a dual feasible pair. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 97 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Interior-Point Methods Constrained Optimization Algorithms KKT Interpretation Ax = b gi (x∗ ) ≤ 0, λ∗i ≥ 0, 1 −λ∗i gi (x∗ ) = , c ∇f (x ) + L X I Interior-Point Methods 1 , then c gi (x) λ∗i ∇gi (x∗ ) T ∇f (x) − ∀i ∀i +A ν =0 To solve this set of (N + M ) linear independent equations (of N + M variables x and ν ) consider the nonlinear part of the rst set of equations ∇f (x + d) − L X i=1 Complementary slackness, λi gi (x) = 0 → −λ∗i gi (x∗ ) = 1 ∇gi (x) + AT ν = 0 c gi (x) Ax = b I ∗ L X i=1 satised by x∗ (c), λ∗ (c) and ν ∗ (c) ∀i i=1 I Let λi = − Central path (centrality) conditions can be seen as deformed KKT conditions, i.e., ∗ 98 / 118 Newton Step for Modied KKT equations I I 25-Dec-2014 1 c L X 1 1 ∇gi (x + d) ∼ ∇gi (x) = ∇f (x) − c gi (x + d) c gi (x) i=1 | {z } =ĝ As c → ∞, x∗ (c), λ∗ (c) and ν ∗ (c) almost satisfy the KKT optimality conditions. + H(x)d − L X i=1 | 1 Hg (x)d + c gi (x) i {z L X i=1 1 ∇gi (x)∇T gi (x)d c gi2 (x) } =Ĥd Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms I 25-Dec-2014 99 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Interior-Point Methods Constrained Optimization Algorithms Substituting back, we obtain I 100 / 118 Interior-Point Methods Let us represent the previous modied KKT equations in matrix form T Ĥd + A ν = −ĝ Ad = 0 Ĥ A AT 0 d −ĝ = ν 0 ∗ whose solution would give the modied Newton step ∆xnt and νnt where Ĥ = H(x) − L X i=1 ĝ = ∇f (x) − L X 1 1 ∇gi (x)∇T gi (x) Hg (x) + c gi (x) i c gi2 (x) i=1 L X i=1 I 25-Dec-2014 c H(x) + Hφ (x) A AT 0 ∆xnt −c ∇f (x) − ∇φ(x) = ∗ νnt 0 ∗ where νnt = c ν∗. 1 ∇gi (x) c gi (x) I Using the derivations of ∇φ(x) and Hφ (x) Using this Newton step, the Interior-Point Method (i.e., Barrier Method) can be constructed. 1 Hφ (x) c 1 ĝ = ∇f (x) + ∇φ(x) c Ĥ = H(x) + Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 101 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 102 / 118 Constrained Optimization Algorithms Interior-Point Methods Constrained Optimization Algorithms I The Interior-Point Method (Barrier Method): Given a strictly feasible x ∈ F, c = c(0) (> 0), µ > 1 and tolerance > 0 repeat 1. Centering step: Compute x∗ (c) by minimizing the modied barrier problem x∗ (c) = argmin (c f (x) + φ(x)) s.t. Ax = b 4. Increase c:, c = µ c L < c Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms I 25-Dec-2014 103 / 118 Constrained Optimization Algorithms Choice of µ: - µ provides a tade-o in the number of iterations for the inner and outer loops. - small µ: at each step inner loop starts from very good point, few inner loop iterations are required, but too many outer loop iterations may be required. - large µ: at each step c increases a large amount, the current starting point may not be a good point for the inner loop. Too many inner loop iterations may be required, but few outer loop iterations are required. - In practice, small values of µ (i.e., near one) result in many outer iterations, with just a few Newton steps for each outer iteration. For µ in a fairly large range, from around 3 to 100 or so, the two eects nearly cancel, so the total number of Newton steps remains approximately constant. This means that the choice of µ is not particularly critical; values from around 10 to 20 or so seem to work well. Constrained Optimization Algorithms Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Interior-Point Methods Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Accuracy of centering: - Computing x∗ (c) exactly is not necessary since the central path has no signicance beyond the fact that it leads to a solution of the original problem as c → ∞; inexact centering will still yield a sequence of points x(k) that converges to an optimal point. Inexact centering, however, means that the points λ∗ (c) and ν ∗ (c), computed from the rst two equation given in the section titled "Dual points from central path", are not exactly dual feasible. This can be corrected by adding a correction term to these formulae, which yields a dual feasible point provided the computed x is near the central path, i.e., x∗ (c). - On the other hand, the cost of computing an extremely accurate minimizer of c f (x) + φ(x), as compared to the cost of computing a good minimizer of c f (x) + φ(x), is only marginally more, i.e., a few Newton steps at most. For this reason it is not unreasonable to assume exact centering. starting at x using the modied Newton's Method. 2. Update: x = x∗ (c) 3. Stopping criterion: uit if Interior-Point Methods 25-Dec-2014 105 / 118 Interior-Point Methods I 25-Dec-2014 104 / 118 Interior-Point Methods Choice of c(0) : - large c(0) : rst run of inner loop may require too many iterations. - small c(0) : more outer-loop iterations are required. L - One reasonable choice is to choose c(0) so that c(0) is approximately of the same order as f (x(0) ) − p∗ , or µ times this amount. For example, if a dual feasible point (λ, ν) is known, with duality gap η = f (x(0) )`(λ, ν), then we can take c(0) = Lη . Thus, in the rst outer iteration we simply compute a pair with the same duality gap as the initial primal and dual feasible points. - Another possibility is to nd c(0) which minimizes inf kc ∇f (x(0) ) + ∇φ(x(0) ) + AT νk2 ν Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 106 / 118 25-Dec-2014 108 / 118 Interior-Point Methods Example 25: Solution: Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 107 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms Interior-Point Methods Constrained Optimization Algorithms Interior-Point Methods Example 26: Solution: Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 109 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Interior-Point Methods Constrained Optimization Algorithms 25-Dec-2014 110 / 118 Interior-Point Methods How to start from a feasible point? Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 111 / 118 I Consider gi (x) ≤ 0, i = 1, 2, . . . , L and Ax = b. I We always assume that we are given a point x(0) ∈ Ax(0) = b. I If such a point is not knows, a preliminary stage, Phase I is run rst Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms I L T i=1 dom gi (x) satisfying Then, we form the following optimization problem min s s.t. gi (x) ≤ s, i = 1, 2, . . . , L Ax = b I The interior-point method requires a strictly feasible starting point x(0) Interior-Point Methods Basic Phase I Method: I I 25-Dec-2014 112 / 118 Interior-Point Methods Then, apply the interior-point method to solve the above problem. There are three cases depending on the optimal value p̄∗ 1. If p̄∗ < 0, then a strictly feasible solution is reached. Moreover if (x, s) is feasible with s < 0, then x satises gi (x) < 0. This means we do not need to solve the optimization problem with high accuracy; we can terminate when s < 0. 2. If p̄∗ > 0, then there is no feasible solution. 3. If p̄∗ = 0, and the minimum is attained at x∗ and s∗ = 0, then the set of inequalities is feasible, but not strictly feasible. However, if p̄∗ = 0 and the minimum is not attained, then the inequalities are infeasible. in the variables x ∈ RN and s ∈ R. s is a bound on the maximum infeasibility of the inequalities and it is to be driven below zero. The problem is always feasible when select s(0) ≥ max gi (x(0) ) together with the i given x(0) ∈ L T i=1 dom gi (x) with Ax(0) = b. Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 113 / 118 Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization 25-Dec-2014 114 / 118 Constrained Optimization Algorithms Interior-Point Methods Constrained Optimization Algorithms Interior-Point Methods Example 27: Solution: Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 115 / 118 Interior-Point Methods Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization Constrained Optimization Algorithms 25-Dec-2014 117 / 118 25-Dec-2014 116 / 118 25-Dec-2014 118 / 118 Interior-Point Methods Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization