ELE604/ELE704 Optimization - Hacettepe University, Department of

Transkript

Contents
Duality
Duality
Lagrange Dual Function
The Lagrange Dual Problem
Dual Problem
Weak and Strong Duality
Weak Duality
Strong Duality
Slater's Condition
Saddle-point Interpretation
Optimality Conditions
Certicate of Suboptimality and Stopping Criterion
ELE604/ELE704 Optimization
Stopping Criterion
Constrained Optimization
Complementary Slackness
KKT Optimality Conditions
Solving The Primal Problem via The Dual
Perturbation and Sensitivity Analysis
Constrained Optimization Algorithms
Introduction
Primal Methods
http://www.ee.hacettepe.edu.tr/∼usezen/ele604/
Feasible Direction Methods
Active Set Methods
Gradient Projection Method
Dr. Umut Sezen & Dr. Cenk Toker
Department of Electrical and Electronic Engineering
Hacettepe University
Equality Constrained Optimization
Quadratic Minimization
Eliminating Equality Constraints
Newton's Method Equality Constraints
Newton's Method with Equality Constraint Elimination
Penalty and Barrier Methods
Penalty Methods
Barrier Methods
Properties of the Penalty & Barrier Methods
Interior-Point Methods
Logarithmic Barrier
Central Path
Dual Points from Central Path
KKT Interpretation
Newton Step for Modied KKT equations
The Interior-Point Algorithm
How to start from a feasible point?
Umut Sezen & Cenk Toker (Hacettepe University) ELE604 Optimization
Duality
25-Dec-2014
1 / 118
Duality
Duality
Duality
I
I
L(x, λ, ν) = f (x) + λT g(x) + ν T h(x)
= f (x) +
g(x) = g1 (x)
h(x) = h1 (x)
···
g2 (x)
···
h2 (x)
gi (x)
···
gL (x)
T
hM (x)
···
hj (x)
λ = λ1
ν = ν1
T
The domain of the optimization problem D is dened by
D = dom f (x) ∩
dom gi (x) ∩
i=1
M
\
I
νj hj (x)
j=1
λ2
ν2
···
···
λi
νj
···
···
λL
T
νM
T
On the feasible set F where
F = {x̃ | x̃ ∈ D ∧ g(x̃) ≤ 0 ∧ h(x̃) = 0}
dom hj (x)
Lagragian has a value less than or equal to the cost function, i.e,
j=1
L(x̃, λ, ν) ≤ f (x̃),
Any point x̃ ∈ D satisfying the constraints is called a feasible point, i.e., g(x̃) ≤ 0
and h(x̃)= 0.
Duality
M
X
are called the Lagrange multiplier vectors.
represent the L inequality and M equality constraints, respectively.
L
\
λi gi (x) +
where λi ≥ 0 and νj are called the Lagrange multipliers, and λ and ν
where g(x) and h(x)
L
X
i=1
h(x) = 0
25-Dec-2014
3 / 118
∀x̃ ∈ F, ∀λi ≥ 0
.
Duality
I Proposition:
I
Dene the Lagrangian L : RN × RL × RM → R as
s.t. g(x) ≤ 0
I
2 / 118
Consider the standard minimization problem (will be referred as the primal
problem)
min f (x)
I
25-Dec-2014
25-Dec-2014
4 / 118
The dual function constitutes a lower bound on p∗ = f (x∗ ), i.e.,
`(λ, ν) ≤ p∗
Then the Lagrange dual function, `(λ, ν), is dened as
∀λ ≥ 0
Let x̃ be a feasible point, that is x̃ ∈ F (i.e., gi (x̃) ≤ 0 and hi (x̃) = 0),
and λ > 0, then
I Proof:
`(λ, ν) = inf L(x, λ, ν)
x∈D
= inf f (x) + λT g(x) + ν T h(x)
λT g(x̃) + ν T h(x̃) ≤ 0.
x∈D
= inf
x∈D
I
f (x) +
L
X
λi gi (x) +
i=1
M
X
Then, the Lagrangian is
!
νj hj (x)
L(x̃, λ, ν) = f (x̃) + λT g(x̃) + ν T h(x̃) ≤ f (x̃)
|
{z
}
j=1
≤0
Dual function `(λ, ν) is the pointwise inmum of a set of ane functions of λ and
ν , hence it is concave even if f (x), gi (x) and hj (x) are not convex.
So,
`(λ, ν) = inf L(x, λ, ν) ≤ L(x̃, λ, ν) ≤ f (x̃)
x∈D
I
25-Dec-2014
5 / 118
∀x̃
The pair (λ, ν) is called dual feasible when λ ≥ 0.
25-Dec-2014
6 / 118
Duality
Duality
Duality
25-Dec-2014
7 / 118
I
Duality
Examples:
I
Least Squares (LS) solution of linear equations
25-Dec-2014
8 / 118
Linear Programming (LP)
min cT x
min xT x
s.t. Ax = b
s.t. Ax = b
- The Lagrangian is
−x≤0
- The Lagrangian is
L(x, ν) = xT x + ν T (Ax − b)
L(x, λ, ν) = cT x − λT x + ν T (Ax − b)
then the Lagrange dual function is
then the Lagrange dual function is
`(ν) = inf L(x, ν)
x
Since L(x, ν) is quadratic in x, it is convex
`(λ, ν) = −bT ν + inf
x
1
∇L(x, ν) = 2x + AT ν = 0 ⇒ x∗ = − AT ν
2
`(λ, ν) =
1
1
`(ν) = L(− AT ν, ν) = − ν T AT Aν − bT ν
2
4
Duality
I
9 / 118
Duality
- This is a discrete-problem since xj ∈ {∓1}, and very dicult to solve for
large N .
The Lagrange dual function is
x
xT Wx +
N
X
νj x2j − 1
)
25-Dec-2014
10 / 118
becomes easy to solve
j = 1, . . . , N
(
otherwise
- If we relax the constraint to be kxk2 = 1, i.e.,
min xT Wx
`(ν) = inf
c + AT ν − λ = 0
Two-way partitioning (a non-convex problem)
s.t. x2j = 1,
(
−bT ν,
−∞,
Ane, i.e., concave when c + AT ν − λ = 0 with λ ≥ 0
So, p∗ ≥ −bT ν when c + AT ν − λ = 0 with λ ≥ 0.
which is obviously concave p∗ ≥ `(ν) = − 14 ν T AT Aν − bT ν
25-Dec-2014
T c + AT ν − λ x
In order `(ν) to be bounded, we must have c + AT ν − λ = 0, i.e.,
Hence Lagrange dual function is given by
N
P
j=1
x2j = 1, then problem
min xT Wx
s.t. kxk2 = 1
with an exact solution of
p∗ = N λmin (W)
where λmin (W) is the minimum eigen value of W.
j=1
n
o
= inf xT (W + diag ν)x − 1T ν
x
(
−1T ν, W + diag ν 0
=
−∞,
else
We may take ν = −λmin (W)1T which yields
p∗ ≥ −1T ν = N λmin (W)
where λmin (W) is the minimum eigen value of W.
25-Dec-2014
11 / 118
25-Dec-2014
12 / 118
Dual Problem
I
Dual Problem
Linear problem (LP)
min cT x
s.t. Ax = b
x≥0
has the dual function
max `(λ, ν)
s.t. λ ≥ 0
`(λ, ν) =
gives the best lower bound for p∗ , i.e., p∗ ≥ `(λ, ν).
I
The pairs `(λ, ν) with λ ≥ 0, ν ∈ R
I
Solution of the above problem for a dual feasible set is called the dual optimal
point (λ∗ , ν ∗ ) (i.e., optimal Lagrange multipliers)
M
∗
and `(λ, ν) > −∞ are dual feasible.
∗
∗
p̂ = `(λ , ν ) ≤ p
I
(
−bT ν,
−∞,
c + AT ν − λ = 0
otherwise
- So, the dual problem can be given by
max − bT ν
s.t. AT ν − λ + c = 0
∗
λ≥0
Some hidden (implicit) constraints can be made explicit in the dual problem, e.g.
consider the LP problems on the next slides.
- The dual problem can be further simplied to the following problem
max − bT ν
s.t. AT ν + c ≥ 0
25-Dec-2014
13 / 118
Dual Problem
25-Dec-2014
14 / 118
Weak Duality
Weak Duality
I
Linear problem (LP) with inequality
min cT x
s.t. Ax ≤ b
has the dual function
I
(
`(λ) =
−bT λ,
−∞,
AT λ + c = 0
We know that
sup `(λ, ν) = p̂∗ ≤ p∗ = inf f (x)
x∈D
λ∈R+ ,ν
otherwise
I
- So, the dual problem can be given by
I
I
max − bT λ
The inequality means weak duality.
Here (p∗ − p̂∗ ) is called the duality gap, which is always nonnegative.
Weak duality always holds even when the primal problem is non-convex.
s.t. AT λ + c = 0
λ≥0
25-Dec-2014
15 / 118
Strong Duality
Strong Duality
refers to the case where the duality gap is zero, i.e.,
I
I
25-Dec-2014
18 / 118
Slater's Condition
Strong duality holds for a convex problem
min f (x)
s.t. g(x) ≤ 0
p̂∗ = p∗
I
16 / 118
Slater's Condition
I
I Strong duality
25-Dec-2014
In general it does not hold.
It may hold if the primal problem is convex.
The conditions under which strong duality hold are called constraint qualications,
where one of them is the Slater's condition.
Ax = b
if it is strongly feasible, i.e., ∃x ∈ interior D
g(x) < 0
Ax = b
i.e., inequality constraints hold with strict inequality.
25-Dec-2014
17 / 118
I
I
p̂∗ ≤ p∗
T
sup L(x, λ) = sup f (x) + λ g(x)
λ≥0
with weak duality as the inequality
λ≥0
= sup
f (x) +
λ≥0
(
=
L
X
!
sup inf L(x, λ) ≤ inf sup L(x, λ)
λi gi (x)
λ≥0 x
i=1
otherwise
sup inf L(x, λ) = inf sup L(x, λ)
λ≥0 x
I
I
If gi (x) ≤ 0, then optimum choice is λi = 0
x λ≥0
and strong duality as the equality
gi (x) ≤ 0, ∀i
f (x),
∞,
with no equality constraint.
I
Hence from duality gap, we have
First consider the following problem,
25-Dec-2014
19 / 118
x λ≥0
With strong duality we can switch inf and sup for λ ≥ 0.
25-Dec-2014
20 / 118
In general, for any f (w, z) : RN × RL → R
sup inf f (w, z) ≤ inf sup f (w, z)
z∈Z w∈W
w∈W z∈Z
with W ⊆ R , Z ⊆ R , and is called the max-min inequality.
N
L
satisfy the strong max-min property or the saddle-point property if the
above inequality holds with equality.
I f (w, z)
I
I
A point {(w̃, z̃)|w̃ ∈ W, z̃ ∈ Z} is called as the saddle-point for f (w, z) if
For Lagrange duality, if x∗ and λ∗ are optimal points for the primal and dual
problems with strong duality (zero duality gap), then tyey form a saddle-point for
the Lagrangian, or vice-versa (i.e., the converse is also true).
f (w̃, z) ≤ f (w̃, z̃) ≤ f (w, z̃)
∀w ∈ W, ∀z ∈ Z
I
i.e.,
f (w̃, z̃) = inf f (w, z̃)
w∈W
= sup f (w̃, z)
z∈Z
the strong max-min property holds with the value f (w̃, z̃).
25-Dec-2014
21 / 118
Certicate of Suboptimality
I
25-Dec-2014
22 / 118
Stopping Criterion
We know that a dual feasible point satises
`(λ, ν) ≤ p∗
i.e., the point (λ, ν) is proof or certicate of this condition.
Then,
I
f (x) − p∗ ≤ f (x) − `(λ, ν)
If an algorithm produces the sequences x(k) and (λ(k) , ν (k) ) check for
?
for primal feasible point x and dual feasible point (λ, ν) whereas the duality gap
associated with these points is
f (x) − `(λ, ν)
f (x(k) ) − `(λ(k) , ν (k) ) < to guarantee -suboptimality. can approach to zero, i.e., → 0, for strong
duality.
in other words
p∗ ∈ [`(λ, ν), f (x)] and p̂∗ ∈ [`(λ, ν), f (x)]
- If the duality gap is zero, i.e., f (x) = `(λ, ν) then x∗ = x and (λ∗ , ν ∗ ) = (λ, ν)
are the primal and dual optimal points.
25-Dec-2014
23 / 118
25-Dec-2014
24 / 118
I
I
Observations: (due to strong duality)
1. Inequality in the third line always holds with equality, i.e., x∗ minimizes
L(x, λ∗ , ν ∗ )
Assume that x∗ and (λ∗ , ν ∗ ) satisfy strong duality
2. From the forth line we have
f (x∗ ) = `(λ∗ , ν ∗ )
= inf f (x) + λT g(x) + ν T h(x)
f (x) +
X
λi gi (x) +
X
i
≤ f (x∗ ) +
X
νj hj (x)
I
j
X
λ∗i gi (x∗ ) +
i
{z
}
≤0
|
νj∗ hj (x∗ )
{z
=0
λ∗i gi (x∗ ) = 0 and since λi gi (x) ≤ 0
∀i
In other words
λ∗i > 0 if gi (x∗ ) = 0
λ∗i = 0 if gi (x∗ ) < 0
j
|
i
which is known as complementary slackness.
!
x
P
λ∗i gi (x∗ ) = 0
x
= inf
}
i.e., the ith optimal lagrange multiplier is
- positive if gi (x∗ ) is active at x∗ .
- zero if gi (x∗ ) is not active at x∗ .
≤ f (x∗ )
25-Dec-2014
25 / 118
25-Dec-2014
26 / 118
I
I x∗
minimizes L(x, λ∗ , ν ∗ ), thus ∇L(x, λ∗ , ν ∗ ) = 0, i.e.,
∇f (x∗ ) +
X
λ∗i ∇gi (x∗ ) +
X
i
I
νj∗ ∇hj (x∗ ) = 0
For any optimization problem with dierentiable objective (cost) and constraint
functions for which strong duality holds, any pair of primal and optimal points
satisfy KKT conditions.
I For convex problems:
j
Then the Karush-Kuchn-Tucker (KKT) conditions for x∗ and (λ∗ , ν ∗ ) being
primal and dual optimal points with zero duality gap (i.e., with strong duality) are
(constraint)
(constraint)
λ∗i ≥ 0 (constraint)
λ∗i gi (x∗ ) = 0 (complementary slackness)
gi (x∗ ) ≤ 0
If f (x), gi (x) and hj (x) are convex and x̃, λ̃ and ν̃ satisfy KKT conditions, then
they are optimal points, i.e.,
- from complementary slackness f (x̃) = L(x̃, λ̃, ν̃)
∗
∇f (x ) +
X
λ∗i ∇gi (x∗ )
+
i
X
νj∗ ∇hj (x∗ )
- from last condition `(λ̃, ν̃) = L(x̃, λ̃, ν̃) = inf L(x, λ̃, ν̃) . Note that
x
L(x, λ̃, ν̃) is convex in x.
hj (x∗ ) = 0
Thus,
f (x̃) = `(λ̃, ν̃).
=0
j
25-Dec-2014
27 / 118
25-Dec-2014
28 / 118
25-Dec-2014
30 / 118
Example 19:
Example 18:
1 T
x Qx + cT x + r
2
s.t. Ax = b
min
Soln:
(Q : SPD)
Solution:
From KKT consitions
Ax∗ = b
∗
T
∗
Qx + c + A ν = 0
)
=
Q
A
AT
0
x∗
−c
=
ν∗
b
25-Dec-2014
29 / 118
25-Dec-2014
31 / 118
I Example 20:
I
If strong duality holds and if dual optimal solution (λ∗ , ν ∗ ) exists, then we can
compute a primal optimal solution from the dual solutions.
I
When strong duality holds and (λ∗ , ν ∗ ) is given, the minimizer of L(x, λ∗ , ν ∗ ),
i.e.,
X
X
λ∗i gi (x) +
min f (x) +
i
νj∗ hj (x)
j
is unique and if it is primal feasible, then it is also primal optimal.
I
If the dual problem is easier to solve (e.g. has less dimensions or has an analytical
solution), then solving the dual problem and nding the optimal dual parameters
(λ∗ , ν ∗ ) rst and then solving
25-Dec-2014
32 / 118
Consider the following problem
min f (x) =
N
X
fi (xi )
i=1
s.t. aT x = b
where each fi (x) is strictly convex and dierentiable, a ∈ RN and b ∈ R. Assume
that the problem has a unique solution (non-empty and non-innity) and it is dual
feasible.
- f (x) is separable because fi (xi ) is a function of xi only. Now the Lagrangian will
be given as
L(x, ν) =
N
X
fi (xi ) + ν aT x − b
i=1
x∗ = argmin L(x, λ∗ , ν ∗ )
25-Dec-2014
N
X
= −bν +
will be an acceptable method to solve constrained minimization problems.
(fi (xi ) + νai xi )
i=1
33 / 118
25-Dec-2014
34 / 118
Then the dual function is given by
`(ν) = inf L(x, ν)
x
= −bν + inf
N
X
x
= −bν +
N
X
i=1
Then the dual problem is a function of a scalar ν ∈ R
(fi (xi ) + νai xi )
i=1
max −bν −
ν
inf (fi (xi ) + νai xi )
xi
|
{z
}
N
X
!
fi∗ (−νai )
i=1
- Once we nd ν ∗ , we know that L(x, ν ∗ ) is strictly convex as each fi (x) is strictly
convex. So, we can nd x∗ by solving
fi∗ (−νai )
= −bν −
N
X
fi∗ (−νai )
∇L(x, ν ∗ ) = 0
i=1
∂fi (xi )
= −ν ∗ ai .
∂xi
where fi∗ (y) is the conjugate function of fi (x).
NOTE: The conjugate function f ∗ (y) is the maximum gap between the linear
function yT x and f (x) (see Boyd Section 3.3). If f (x) is dierentiable, this
occurs at a point x where ∇f (x) = y. Note that, f ∗ (y) is a convex function.
f ∗ (y) =
sup
yT x − f (x)
x∈dom f (x)
25-Dec-2014
35 / 118
25-Dec-2014
36 / 118
I
Original Problem
dual
primal
min f (x)
Here u = [ui ]L×1 and v = [vj ]M ×1 are called the perturbations. When ui = 0 and
vj = 0, the problem becomes the original problem. If ui > 0, it means we have
relaxed the i-th inequality constraint, and if ui < 0, it means we have tightened
the i-th inequality constraint.
I
Let us use the notation p∗ (u, v) to denote the optimal solution of the perturbed
problem. Thus, the optimal solution of the original problem is
p∗ = p∗ (0, 0) = `(λ∗ , ν ∗ ).
I
Assume that strong duality holds and optimal dual solution exists, i.e.,
p∗ = `(λ∗ , ν ∗ ). Then, we can show that
max `(λ, ν)
s.t. g(x) ≤ 0
s.t. λ ≥ 0
h(x) = 0
I
I
Perturbed problem
primal
dual
p∗ (u, v) ≥ p∗ (0, 0) − λ∗T u − ν ∗T v
max `(λ, ν) − λT u − ν T v
min f (x)
s.t. g(x) ≤ u
s.t. λ ≥ 0
h(x) = v
25-Dec-2014
37 / 118
I
I
If λi is large and ui < 0, then p∗ (u, v) is guaranteed to increase greatly.
I
If λi is small and ui > 0, then p∗ (u, v) will not decrease too much.
I
If |νj | is large
- If νj > 0 and vi < 0, then p∗ (u, v) is guaranteed to increase greatly.
- If νj < 0 and vi > 0, then p∗ (u, v) is guaranteed to increase greatly.
I
If |νj | is small
- If νj > 0 and vi > 0, then p∗ (u, v) will not decrease too much.
- If νj < 0 and vi < 0, then p∗ (u, v) will not decrease too much.
25-Dec-2014
39 / 118
Introduction
25-Dec-2014
38 / 118
The perturbation inequality, p∗ (u, v) ≥ p∗ (0, 0) − λ∗T u − ν ∗T v, give a lower
bound on the perturbed optimal value, but no upper bound. For this reason the
results are not symmetric respect to loosening or tightening a constraint, For
example, if λi is large and we loosen the i-th constraint a bit (i.e., take ui small
and positive, 0 < ui < ), then perturbation inequality is not useful, it does not
imply that optimal value will decrease considerably.
25-Dec-2014
40 / 118
25-Dec-2014
42 / 118
Introduction
Introduction
I
The general constrained optimization problem
min f (x)
s.t. g(x) ≤ 0
h(x) = 0
Given a feasible point x(k) , if gi (x) ≤ 0 is satised with
equality, i.e., gi (x) = 0 , then constraint i is said to be active at x(k) . Otherwise it
is inactive at x(k) .
I Defn (Active Constraint):
25-Dec-2014
41 / 118
Primal Methods
Primal Methods
Primal Methods
Feasible Direction Methods
See Luenberger Chapter 12.
I A primal method is a search method that works on the original problem directly by
searching through the feasible region for the optimal solution. Each point in the
process is feasible and the value of the objective function constantly decreases.
For a problem with N variables and M equality constraints, primal methods work
in the feasible space of dimension N − M .
I
Update equation is
x(k+1) = x(k) + α(k) d(k)
d(k) must be a descent direction and x(k) + α(k) d(k) must be contained in the
feasible region, i.e., d(k) must be a feasible direction for some α(k) > 0.
Advantages:
- x are all composed of feasible points.
- If x(k) is a convergent sequence, it converges at least to a local minimum.
- Do not rely on special problem structure, e.g. convexity, in other words,
primal methods are applicable to general non-linear problems.
(k)
Very similar to unconstrained descent methods but now line search is constrained
so that x(k+1) is also a feasible point.
Disadvantages:
- must start from a feasible initial point.
- may fail to converge for inequality constraints if precaution is not taken.
I Example 21:
25-Dec-2014
43 / 118
Primal Methods
25-Dec-2014
44 / 118
Primal Methods
min f (x)
s.t. aTi x ≤ bi ,
- Let A
(k)
i = 1, . . . , M
be the set of indices representing active constraints, i.e.,
aTi x(k+1) = bi , i ∈ A(k) at x(k) , then the direction vector d(k) is calculated by
min ∇T f (x(k) )d
d
s.t. aTi d ≤ 0,
N
X
i ∈ A(k)
|di | = 1
There are two major shortcomings of feasible direction methods that require that they
be modied in most cases.
I The rst shortcoming is that for general problems there may not exist any feasible
directions. If, for example, a problem had nonlinear equality constraints, we might
nd ourselves in the situation where no straight line from x(k) has a feasible
segment. For such problems it is necessary either to relax our requirement of
feasibility by allowing points to deviate slightly from the constraint surface or to
introduce the concept of moving along curves rather than straight lines.
I
i=1
Last line ensures a bounded solution. The other constraints assure that vectors of
the form x(k) + α(k) d(k) will be feasible for suciently small α(k) > 0, and subject
to these conditions, d(k) is chosen to line up as closely as possible with the
negative gradient of f (x(k) ). In some sense this will result in the locally best
direction in which to proceed. The overall procedure progresses by generating
feasible directions in this manner, and moving along them to decrease the
objective.
25-Dec-2014
45 / 118
A second shortcoming is that in simplest form most feasible direction methods are
not globally convergent. They are subject to jamming (sometimes referred to as
zigzagging) where the sequence of points generated by the process converges to a
point that is not even a constrained local minimum point.
Primal Methods
25-Dec-2014
46 / 118
Primal Methods
Active Set Methods
I
I
The idea underlying active set methods is to partition inequality constraints into
two groups: those that are to be treated as active and those that are to be treated
as inactive. The constraints treated as inactive are essentially ignored.
I
Let A be the set of indices of active constraints (i.e., gi (x∗ ) = 0, i ∈ A). Then
∇f (x∗ ) +
X
λi ∇gi (x∗ ) = 0
i∈A
min f (x)
gi (x∗ ) = 0,
i∈A
s.t. g(x) ≤ 0
gi (x∗ ) < 0,
i∈
/A
λi ≥ 0,
i∈A
λi = 0,
i∈
/A
For simplicity, no equality constraints, the inclusion of equality constraints will be
straightforward.
Inactive constraints are inhibited (i.e., λi = 0).
Necessary conditions for optimum x are
∗
∇f (x ) + ∇ λT g(x∗ ) = 0
∗
I
g(x∗ ) ≤ 0
Just taking active constraints in A, problem is converted to an
equality-constrained-only problem.
λ g(x∗ ) = 0
T
λ≥0
25-Dec-2014
47 / 118
25-Dec-2014
48 / 118
Primal Methods
Primal Methods
I
Active set method:
- at each step nd the "working set" and treat it as the active set (can be a
subset of the actual active set).
- move to a lower point on the "surface" of the working set
- nd a new "working set" for this point
- repeat
I
Active set method:
- at each step nd the working set W and treat it as the active set (can be a
subset of the actual active set, i.e., W ⊆ A ).
- move to a lower point on the surface of the working set
- nd a new working set W for this point
- repeat
I
The surface dened by the working set will be called the working surface.
I
The surface dened by the working set W will be called the working surface.
25-Dec-2014
49 / 118
Primal Methods
I
I
Given any working set W ⊆ A, Assume that xW is a solution to the problem PW
25-Dec-2014
50 / 118
Primal Methods
If ∃i ∈ W such that λi < 0, then dropping constraint i (but staying feasible) will
decrease the value due to the sensitivity theorem (relaxing the constraint i to be
gi (x) = −c reduces f (x) by λi c).
min f (x)
s.t. gi (x) = 0,
i∈W
also satisfying gi (xW ) < 0, i ∈/ A. If xW cannot be found change the working set
W until a solution xW is obtained.
I
Once xW is found, then solve
∇f (xW ) +
X
λi ∇gi (xW ) = 0
i∈W
to nd λi .
I
If λi ≥ 0, ∀i ∈ W, then xW is a local optimal solution of the original problem.
I
I
25-Dec-2014
51 / 118
Primal Methods
The surface dened by a working set is called a working surface. By dropping i
from W and moving on the new working surface (toward the interior of the feasible
region F, we move to an improved solution. )
Monitoring the movement to avoid infeasibility until one or more constraints
become active, then add them to the working set W. Now, solve the changed
problem PW again to nd the a new xW and repeat the previous steps.
25-Dec-2014
53 / 118
I
25-Dec-2014
52 / 118
Primal Methods
If we can assure the the objective function (cost function) value is monotonically
decreasing, then any working set will not appear twice in the process. Hence the
active set method terminates in a nite number of iterations.)
25-Dec-2014
54 / 118
Active Sets Algorithm:
Primal Methods
I
I
Disadvantage: The inner loop must terminate at a global optimum in order to
I
Discuss: How can we integarte equality constraints to the original problem?
For a given working set W
repeat
repeat
- minimize f (x) over the working set W using x(k+1) = x(k) + α(k) d(k)
- check whether a new constraint becomes active.
if so, add the new constraint to the working set W.
until some stopping criterion is satised
check the Lagrange multipliers λi
- drop constraints with λi < 0 from the working set W
until some stopping criterion is satised
Primal Methods
determine correct λi , and the same working set is not encountered in the following
iterations.
Note that f (x) strictly decreases at each step.
25-Dec-2014
55 / 118
Primal Methods
25-Dec-2014
56 / 118
Primal Methods
Gradient Projection Method
I
Gradient Projection Method is an extension of the Gradient (or Steepest) Descent
I
Let us rst consider linear constraints
Method (GD or SD) to the constrained case.
I
Another perspective is to let the equality Ax = b to represent all the active
constraints aTi x = bi , i ∈ W. Thus
min f (x)
s.t. Ax = b
min f (x)
s.t. aTi x ≤ bi ,
i ∈ I1
aTi x = bi ,
i ∈ I2
I
If we use rst order Taylor approximation around point x(k) , such that
f (x) = f (x(k) + d) ∼
= f (x(k) ) + ∇T f (x(k) )d for small enough d , the we will have
min f (x(k) ) + ∇T f (x(k) )d
s.t. A x(k) + d = b
Let us take the active constraints,i.e., aTi x = bi , as the working set W and seek a
feasible descent direction
nd ∇T f (x)d < 0
while satisfying aTi d = 0, i ∈ W
i.e., d must lie in the tangent plane dened by
aTi d
dT Id ≤ 1
I
As Ax(k) = b and Ax(k+1) = b, so this implies that Ad = 0.
= 0, i ∈ W.
Hence, the above problem is the projection of −∇f (x) on to this tangent plane.
25-Dec-2014
57 / 118
Primal Methods
25-Dec-2014
58 / 118
Primal Methods
Projected Steepest Descent Algorithm (PSDA):
I
1. For a feasible initial point x(0) (i.e., x(0) ∈ F)
Thus, the problem simplies to
2. Solve the Direction Finding Problem (DFP)
min ∇T f (x(k) )d
d(k) = argmin ∇T f (x(k) )d
s.t. Ad = 0
T
s.t. Ad = 0
d Id ≤ 1
dT Qd ≤ 1,
- Ad = 0 denes the tangent plane M ⊆ RN and ensures that x(k+1) is still feasible.
- dT Id ≤ 1 is the Euclidean unit ball (kdk22 ≤ 1), thus this is the projected GD
algorithm.
- We may also use the constraint d Qd ≤ 1, where Q is a SPD matrix, to obtain
the projected SD algorithm.
T
(Q : SPD)
3. If ∇T f (x(k) )d(k) = 0, stop. x(k) is a KKT point.
4. Solve α(k) = argmin f (x(k) + αd(k) ) using a line search algorithm, e.g. exact or
backtracking line search.
5. x(k+1) = x(k) + α(k) d(k)
6. Goto Step 1 with x(0) = x(k+1) .
25-Dec-2014
59 / 118
25-Dec-2014
60 / 118
Primal Methods
Primal Methods
Projection:
I
DFP problem is another constrained optimization problem which should satisfy its
KKT conditions, i.e.,
Thus, d̃(k) is obtained as
−1
d̃(k) = − Q−1 − Q−1 AT AQ−1 AT
AQ−1 ∇f (x(k) )
Ad(k) = 0
T
= −P(k) ∇f (x(k) )
d(k) Qd(k) = 0
∇f (x
(k)
) + 2βk Qd
(k)
T
where
+ A λk = 0
βk ≥ 0
(k) T
(k)
Qd
=0
βk 1 − d
−1
P(k) = Q−1 − Q−1 AT AQ−1 AT
AQ−1
is called the projection matrix.
I
- Here, Ad(k) = 0 denes the tangent plane M(k) ⊆ RN .
Note that if Q = I, then
−1
P(k) = I − AT AAT
A
If we set d̃(k) = 2βk d(k) , From the third condition we nd that
d̃(k) = −Q−1 ∇f (x(k) ) − Q−1 AT λk
I
Now if put this value into the rst condition we obtain
As 2βk is just a scaling factor, we can safely write that the descent direction d(k)
is given by
d(k) = −P(k) ∇f (x(k) )
−1
λk = − AQ−1 AT
AQ−1 ∇f (x(k) )
25-Dec-2014
61 / 118
Primal Methods
4. If d(k) = 0
a) Find λ
Given a feasible point x(k) (i.e., x(k) ∈ F)
1. Find the active constraint set W(k) and form A (actually A(k) ) matrix
2. Calculate
PSDA with DFP Algorithm:
25-Dec-2014
62 / 118
Primal Methods
−1
λ = − AQ−1 AT
AQ−1 ∇f (x(k) )
b) If λi ≥ 0, ∀i, stop. x(k) satises KKT.
c) If ∃λi < 0, delete the row from A corresponding to the inequality with the
most negative component of λi and drop its index from W. Goto Step 2.
−1
P(k) = Q−1 − Q−1 AT AQ−1 AT
AQ−1
d(k) = −P(k) ∇f (x(k) )
3. If d(k) 6= 0
a) Find α
n
o
α1 = max α : x(k) + αd(k) is feasible
n
o
α2 = argmin f (x + αd(k) ) : 0 ≤ α ≤ α1
b) x(k+1) = x(k) + α2 d(k)
c) Goto Step 1 (with x(k) = x(k+1) )
I Example 22:
25-Dec-2014
63 / 118
Primal Methods
x1 + x2 + 2x3 + x4 = 6
i = 1, 2, 3, 4
Given the initial point x(0) = 2 2 1 0 T , nd the initial direction d(0) for the
projected gradient descent algorithm (PGDA).
- The active constraints are the two equalities and the inequality x4 ≥ 0, thus

2
A = 1
0
1
1
0
Primal Methods

22 9
AAT =  9 7
4 1

6
−1
1 
−5
AAT
=
11
−19
s.t. 2x1 + x2 + x3 + 4x4 = 7
64 / 118
So,
min x21 + x22 + x23 + x24 − 2x1 − 3x4
xi ≥ 0,
25-Dec-2014

1
2
0
4
1
1



4
1
1

−19
14 
73
−5
6
14
Hence, the projection matrix P(0) is given by
P
(0)
T
=I−A
T
AA
−1

1
1 
−3
A=
11  1
0
−3
9
−3
0
1
−3
1
0

0
0

0
0
Finally, the driection d(0) is given by
Also, ∇f (x(0) ) is given by
∇f (x
(0)
2
4

)=
2
−3
d
25-Dec-2014
65 / 118
(0)
= −P
(0)
∇f (x
(0)
1
)=
11
 
−8
 24 
 .
−8
0
25-Dec-2014
66 / 118
Primal Methods
Primal Methods
Nonlinear constraints:
I
Consider the problem which denes only the active constraints
min f (x)
s.t. h(x) = 0
I
I
In linear constraints x lie on the tangent plane M. However in nonlinear
constraints, the surface dened by the constraints and the tangent plane M touch
in a single point
(k)
In this case updated point must be projected onto the constrained surface. For
projected gradient descent method, the projection matrix is given by
−1
P(k) = I − JTh (x(k) ) Jh (x(k) )JTh (x(k) )
Jh (x(k) )
where Jh (x) is the Jacobian of h(x), i.e.,
T
Jh (x) = ∇hT (x) .
25-Dec-2014
67 / 118
25-Dec-2014
68 / 118
Quadratic Minimization
See Boyd Chapter 10.
1 T
x Qx + cT x + r
2
s.t. Ax = b
min
min f (x)
s.t. Ax = b
where Q ∈ RN ×N is a symmetric positive semidene matrix (SPSD), A ∈ RM ×N
and b ∈ RM .
where f (x) : RN → R, convex, twice dierentiable, A ∈ RM ×N , M < N .
Minimum occurs at
I
Using KKT conditions
p∗ = inf {f (x) | Ax = b} = f (x∗ )
Ax∗ = b
where x∗ satises KKT conditions
∗
T
∗
Qx + c + A ν = 0
)
=
|
T
I
∗
∇f (x ) + A ν = 0
25-Dec-2014
69 / 118
I
I
One general approach to solving the equality constrained problem is to eliminate
the equality constraints and then solve the resulting unconstrained problem using
methods for unconstrained minimization.
I
We rst nd a matrix F ∈ RN ×(N −M ) and vector x̂ that parametrize the (ane)
feasible set:
n
o
{x | Ax = b} = Fz + x̂ | z ∈ RN −M .
We then form the reduced or eliminated optimization problem
25-Dec-2014
70 / 118
There are, of course, many possible choices for the elimination matrix F, which
can be chosen as any matrix in RN ×(N −M ) with the range (columnspace) of the
matrix F is equals to the nullspace of the matrix A, i.e., R(F) = N (A). If F is
one such matrix, and T ∈ R(M −N )×(M −N ) is nonsingular, then F̃ = FT is also a
suitable elimination matrix, since
R(F̃) = R(F) = N (A).
I
Here x̂ can be chosen as any particular solution of Ax = b, and F ∈ RN ×(N −M ) is
any matrix whose range (columnspace) is the nullspace of A, i.e., R(F) = N (A).
Conversely, if F and F̃ are any two suitable elimination matrices, then there is
some nonsingular T such that F̃ = FT. If we eliminate the equality constraints
using F, we solve the unconstrained problem
min f (Fz + x̂)
while if F̃ is used, we solve the unconstrained problem
min f˜(z) = f (Fz + x̂)
which is an unconstrained problem with variable z ∈ R
. From its solution z ,
we can nd the solution of the equality constrained problem as
N −M
∗
x∗ = Fz∗ + x̂
.
Above equations denes a KKT system for an equality constrained quadratic
optimization problem with (N + M ) linear equations in the (N + M ) variables
(x∗ , ν ∗ ).
Eliminating Equality Constraints
I
−c
AT x∗
=
ν∗
b
0
{z }
KKT matrix
Ax = b
∗
Q
A
25-Dec-2014
71 / 118
min f (F̃z̃ + x̂) = f (F(Tz̃) + x̂)
This problem is equivalent to the one above, and is simply obtained by the change
of coordinates z = Tz̃. In other words, changing the elimination matrix can be
thought of as changing variables in the reduced problem.
25-Dec-2014
72 / 118
Example 23:
Newton's Method Equality Constraints
I
It is the same as unconstrained Newton's Method except
- initial point is feasible, x(0) ∈ F, i.e., Ax(0) = b.
- Newton step ∆xnt is a feasible direction, i.e., A∆xnt = 0.
I
In order to use Newton's Method on the problem
min f (x)
s.t. Ax = b
we can use second-order Taylor approximation around x (actually x(k) ) to obtain a
quadratic minimization problem
min f (x + ∆x) = f (x) + ∇T f (x)∆x +
1
∆xT H(x)∆x
2
s.t. A (x + ∆x) = b
This problem is convex if H(x) 0
25-Dec-2014
73 / 118
I
I
Newton decrement λ(x) is the same as the one used for the unconstrained
problem, i.e.,
1/2
= k∆xnt kH(x)
being the norm of the Newton step in the norm dened by H(x).
−∇f (x)
H(x) AT ∆xnt
=
.
∗
ν
0
A
0
|
{z
}
Solution exists when the KKT matrix is non-singular.
λ2 (x)
2
I
, being a good estimate of f (x) − p∗ (i.e., f (x) − p∗ ≈
2
as the stopping criterion λ 2(x) ≤ .
I
Appears in the line search
KKT matrix
I
74 / 118
λ(x) = ∆xTnt H(x)∆xnt
Using the results of quadratic minimization with equality constraints
25-Dec-2014
The same solution can also be obtained by setting x∗ = x + ∆xnt in the
optimality condition equations of the original problem
d f (x + α∆xnt ) dα
Ax∗ = b
I
∗
∇f (x ) + AT ν ∗ = 0
λ2 (x)
2
), can be used
= ∇T f (x)∆xnt = −λ2 (x)
α=0
In order to Newton step ∆xnt to be a feasible descent direction the following two
conditions need to be satised
as ∆xnt and ν ∗ should satisfy the optimality conditions.
∇T f (x)∆xnt = −λ2 (x)
A∆xnt = 0
I
25-Dec-2014
75 / 118
Newton's Method with Equality Constraint
Elimination
I
25-Dec-2014
76 / 118
It can be shown that
∆xnt = F∆znt
min f (x)
s.t. Ax = b
≡
where ∆xnt is the Newton step of the original problem with equality constraints.
min f˜(z) = f (Fz + x̂)
with R(F) = N (A) and Ax̂ = b.
I
The solution for equality constrained Newton's Method is invariant to ane
transformations.
I
The gradient and Hessian of f˜(z) are
λ̃2 (z) = ∆zTnt H̃(z)∆znt
∇f˜(z) = FT ∇f (Fz + x̂)
= ∆zTnt FT H(x)F∆znt
H̃(z) = FT H(Fz + x̂)F
I
Then, the KKT matrix is invertible, i H̃(z) is invertible.
I
The Newton step of the reduced problem is
The Newton decrement λ̃(z) is the same as the Newton decrement of the original
problem
= ∆xTnt H(x)∆xnt
= λ2 (x)
−1
∆znt = −H̃−1 (z)∇f˜(z) = − FT H(x)F
FT ∇f (x)
where x = Fz + x̂.
25-Dec-2014
77 / 118
25-Dec-2014
78 / 118
Penalty Methods
min f (x)
s.t. x ∈ F
See Luenberger Chapter 13.
I
I
They approximate constrained problems by unconstrained problems
I
The approximation is obtained by adding
- a term with a high cost for violating the constraints (penalty methods)
- a term that favors points interior to the feasible points over those near the
boundary (barrier methods)
to the objective function (cost function).
I
The penalty and barrier methods work directly in original the N -dimensional space
RN rather than the (N − M )-dimensional space RN −M as in the case of the
primal methods.
I
I
The idea is to replace the this problem by the following penalty problem
min f (x) + cP (x)
where c ∈ R++ is a constant and P (x) : RN → R is the penalty function.
Here
- P (x) is continuous
- P (x) ≥ 0 ∀x ∈ RN
- P (x) = 0 i x ∈ F
In general, the following quadratic penalty function is used
P (x) =
L
M
1X
1X
(max {0, gi (x)})2 +
(hj (x))2
2 i=1
2 j=1
where gi (x) ≤ 0 are the inequality constraints and hj (x) = 0 are the equality
constraints.
I
For example, consider P (x) =
1
2
2
P
i=1
25-Dec-2014
79 / 118
25-Dec-2014
80 / 118
(max {0, gi (x)})2 with g1 (x) = x − b and
g2 (x) = a − x
Penalty Method:
n
o
I
Let c(k) , k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and
c(k+1) > c(k) .
I
Let
q(c, x) = f (x) + cP (x)
and for each k solve the penalty problem
min q(c(k) , x)
obtaining a solution point x(k) .
I
For large c, the minimum point of the penalty problem will be in a region where
P (x) is small.
I
For increasing c, the solution will approach the feasible region F and as c → ∞ the
penalty problem will converge to a solution of the constrained problem.
25-Dec-2014
81 / 118
25-Dec-2014
82 / 118
Barrier Methods
Also known as interior methods.
min f (x)
I Lemma:
i.
ii.
iii.
iv.
As k → ∞, c
(k+1)
>c
(k)
s.t. x ∈ F
(e.g., start from an exterior point)
I
q(c(k) , x(k) ) ≤ q(c(k+1) , x(k+1) )
P (x(k) ) ≥ P (x(k+1) )
The idea is to replace the this problem by the following barrier problem
min f (x) +
f (x(k) ) ≤ f (x(k+1) )
s.t. x ∈ interior of F
f (x∗ ) ≥ q(c(k) , x) ≥ f (x(k) )
n
o
I Theorem: Let x(k) , k = 1, 2, . . . be a sequence generated by the penalty
method. Then, any limit of the sequence is a solution of the original constrained
problem.
I
I
where c ∈ R++ is a constant and B(x) : RN → R is the barrier function.
Here, barrier function B(x) is dened on the interior of F such that
- B(x) is continuous
- B(x) ≥ 0
- B(x) → ∞ as x approaches the boundary of F
Ideally,
(
B(x) =
25-Dec-2014
1
B(x)
c
83 / 118
0,
∞,
x ∈ interior F
x∈
/ interior F
25-Dec-2014
84 / 118
I
I
For example, consider B(x) = −1/g1 (x) − 1/g2 (x), g1 (x) = x − b and
g2 (x) = a − x.
There are several approximations. Two common barrier functions for the inequality
constraints are given below
Log barrier: B(x) = −
L
X
log (−gi (x))
i=1
Inverse barrier: B(x) = −
L
X
i=1
1
gi (x)
25-Dec-2014
85 / 118
25-Dec-2014
86 / 118
Barrier Method:
I
I
n
o
Let c(k) , k = 1, 2, . . . be a sequence tending to ∞ such that ∀k c(k) ≥ 0 and
c(k+1) > c(k) .
Let
r(c, x) = f (x) +
and for each k solve the barrier problem
I Lemma:
i.
ii.
iii.
iv.
1
B(x)
c
As k → ∞, c(k+1) > c(k) (start from an interior point)
r(c(k) , x(k) ) ≥ r(c(k+1) , x(k+1) )
B(x(k) ) ≤ B(x(k+1) )
f (x(k) ) ≥ f (x(k+1) )
f (x∗ ) ≤ f (x(k) ) ≤ r(c(k) , x)
n
o
Any limit point of a sequence x(k) , k = 1, 2, . . . generated by the
barrier method is a solution of the original constrained problem.
min r(c(k) , x)
I Theorem:
s.t. x ∈ interior of F
obtaining a solution point x(k) .
25-Dec-2014
87 / 118
Properties of the Penalty & Barrier Methods
I
I
Let pi (x) = max {0, gi (x)} and p(x) = pi
where F(x), G(x) and Γ(x) are Hessians of f (x), g(x) and γ(x) respectively, and
Jp (x) is the Jacobian of p(x).
L×1
P (x) = pT (x)p(x) =
L
X
I
If at x∗ there are r active constraints, then the Hessian matrices Q(c(k) , x(k) )
have r eigenvalues tending to ∞ as c(k) → ∞ and (n − r) eigenvalues tending to
some nite value. In other words, condition number goes to innity (κ → ∞) as
c(k) → ∞.
I
Gradient Descent may not be directly applied, instead Newton's Method is
preferred!
I
Same observation also applies to the barrier method.
(pi (x))2
i=1
or more generally to be the quadratic norm function
γ(y) = yT Γ y
I
⇒
Dening
Q(c, x) = F(x) + c ∇T γ(p(x)) G(x) + c JTp (x) Γ(p(x)) Jp (x)
Let the penalty function γ(x) where P (x) = γ(p(x)) to be the Euclidean norm
function
⇒
The Hessian Q(c, x) is given by
γ(y) = yT y
88 / 118
q(c, x) = f (x) + c γ(p(x))
The penalty method solves the unconstrained problem
min f (x) + cP (x)
I
25-Dec-2014
P (x) = pT (x) Γ p(x)
The Hessian of the above problem becomes more and more ill-conditioned as
c→∞
25-Dec-2014
89 / 118
25-Dec-2014
90 / 118
See Boyd Chapter 11.
I
The function
!
L
X
φ(x) = −
1
−
log (−gi (x))
min f (x) +
c
i=1
L
X
log (−gi (x))
i=1
with dom φ(x) = x ∈ RN | gi (x) < 0, ∀i is called the logarithmic barrier
s.t. Ax = b
function.
I
We will modify the Newton's algorithm to solve the above problem. So, we will
need
∇φ(x) = −
L
X
i=1
Hφ (x) =
L
X
1
∇gi (x)
gi (x)
1
g 2 (x)
i=1 i
∇φ(x)∇T φ(x) −
L
X
i=1
1
Hg (x)
gi (x) i
where Hφ (x) = ∇2 φ(x) and Hgi (x) = ∇2 gi (x) are the Hessians of φ(x) and
gi (x) respectively.
Minimum occurs at
∗
∗
p = inf {f
(x) |Optimization
Ax = b} = f (x )
Umut Sezen & Cenk Toker (Hacettepe University)
ELE604
25-Dec-2014
91 / 118
I Example 24:Inequality
Central Path
I
s.t. Ax ≤ b
(where e ∈ R , A ∈ R
s.t. Ax = b
with dom φ(x) = x ∈ RN | gi (x) < 0, ∀i is called the logarithmic barrier
N
function.
I
φ(x) = −
The point x (c) is the solution. The trajectory of x (c) as a function of c is called
the central path. Points on the central path satisfy
∗
Ax∗ (c) = b
gi (x∗ (c)) < 0,
c ∇f (x∗ (c)) + ∇φ(x∗ (c)) + AT ν̂ = 0



∀i Centrality conditions


and b ∈ RL are constants) is given by
log bi − aTi x ,
dom φ(x) = {x | Ax < b}
i=1
where aT1 , . . . , aTL are the rows of A.
The gradient and Hessian of the barrier function are
∇φ(x) =
L
X
i=1
1
ai
bi − aTi x
= AT d
Last line could be rewritten as
c ∇f (x∗ (c)) −
N ×L
L
X
for some ν̂ ∈ RM .
I
min eT x
min cf (x) + φ(x)
∗
92 / 118
form linear programming. The logarithmic barrier function
for an LP in inequality form
Consider the equivalent problem (c > 0)
25-Dec-2014
Hφ (x) =
L
X
i=1
L
X
i=1
1
∇gi (x∗ (c)) + AT ν̂ = 0
gi (x∗ (c))
= AT (diag d)2 A
where di =
25-Dec-2014
93 / 118
1
.
bi − aTi x
Since x is strictly feasible, we have d > 0, so the Hessian of φ(x), Hφ (x) is
nonsingular if and only if A has rank N , i.e. full-rank.
The centrality condition is
c e + AT d = 0
We can give a simple geometric interpretation of the centrality condition. At a
point x∗ (c) on the central path the gradient ∇φ(x∗ (c)), which is normal to the
level set of φ(x) through x∗ (c), must be parallel to e. In other words, the
hyperplane eT x = eT x∗ (c) is tangent to the level set of φ(x) through x∗ (c).
Figure below shows an example with L = 6 and N = 2.
25-Dec-2014
1
T
2 ai ai
(bi − aTi x)
95 / 118
25-Dec-2014
94 / 118
The dashed curves in the previous gure show three contour lines of the
logarithmic barrier function φ(x). The central path converges to the optimal point
x∗ as c → ∞. Also shown is the point on the central path with c = 10. The
optimality condition at this point can be veried geometrically: The line
eT x = eT x∗ (10) is tangent to the contour line of φ(x) through x∗ (10).
25-Dec-2014
96 / 118
Dual Points from Central Path
I
So, the dual function optimal value p̂∗ (c) = `(λ∗ (c), ν ∗ (c)) is nite and given as
`(λ∗ (c), ν ∗ (c)) = f (x∗ (c)) +
From the centrality condition, let
1
,
c gi (x∗ (c))
∀i
=1
c
=0
L
= f (x (c)) −
c
I
∇f (x∗ (c)) +
λ∗i (c)gi (x∗ (c)) +ν ∗T (c) (Ax∗ (c) − b)
|
{z
}
{z
}
|
∗
ν̂
c
ν ∗ (c) =
L
X
L
X
i=1
λ∗i (c) = −
then
Duality gap is
L
and
c
f (x∗ (c)) − p∗ ≤
λ∗i (c)∇gi (x∗ (c)) + AT ν ∗ (c) = 0
goes to zero, as c → ∞.
i=1
L
c
Hence, from KKT, x (c) minimizes
∗
L(x, λ, ν) = f (x) + λT g(x) + ν T (Ax − b)
for λ = λ∗ (c) and ν = ν ∗ (c) for a particular c, which means that (λ∗ (c), ν ∗ (c)) is
a dual feasible pair.
25-Dec-2014
97 / 118
KKT Interpretation
Ax = b
gi (x∗ ) ≤ 0,
λ∗i ≥ 0,
1
−λ∗i gi (x∗ ) = ,
c
∇f (x ) +
L
X
I
1
, then
c gi (x)
λ∗i ∇gi (x∗ )
T
∇f (x) −





∀i 





∀i 

+A ν =0
To solve this set of (N + M ) linear independent equations (of N + M variables x
and ν ) consider the nonlinear part of the rst set of equations
∇f (x + d) −
L
X
i=1
Complementary slackness, λi gi (x) = 0 → −λ∗i gi (x∗ ) =
1
∇gi (x) + AT ν = 0
c gi (x)
Ax = b
I










∗
L
X
i=1
satised by
 x∗ (c), λ∗ (c) and ν ∗ (c)
∀i
i=1
I
Let λi = −
Central path (centrality) conditions can be seen as deformed KKT conditions, i.e.,
∗
98 / 118
Newton Step for Modied KKT equations
I
I
25-Dec-2014
1
c
L
X 1
1
∇gi (x + d) ∼
∇gi (x)
= ∇f (x) −
c gi (x + d)
c gi (x)
i=1
|
{z
}
=ĝ
As c → ∞, x∗ (c), λ∗ (c) and ν ∗ (c) almost satisfy the KKT optimality conditions.
+ H(x)d −
L
X
i=1
|
1
Hg (x)d +
c gi (x) i
{z
L
X
i=1
1
∇gi (x)∇T gi (x)d
c gi2 (x)
}
=Ĥd
I
25-Dec-2014
99 / 118
Substituting back, we obtain
I
100 / 118
Let us represent the previous modied KKT equations in matrix form
T
Ĥd + A ν = −ĝ
Ad = 0
Ĥ
A
AT
0
d
−ĝ
=
ν
0
∗
whose solution would give the modied Newton step ∆xnt and νnt
where
Ĥ = H(x) −
L
X
i=1
ĝ = ∇f (x) −
L
X
1
1
∇gi (x)∇T gi (x)
Hg (x) +
c gi (x) i
c gi2 (x)
i=1
L
X
i=1
I
25-Dec-2014
c H(x) + Hφ (x)
A
AT
0
∆xnt
−c ∇f (x) − ∇φ(x)
=
∗
νnt
0
∗
where νnt
= c ν∗.
1
∇gi (x)
c gi (x)
I
Using the derivations of ∇φ(x) and Hφ (x)
Using this Newton step, the Interior-Point Method (i.e., Barrier Method) can be
constructed.
1
Hφ (x)
c
1
ĝ = ∇f (x) + ∇φ(x)
c
Ĥ = H(x) +
25-Dec-2014
101 / 118
25-Dec-2014
102 / 118
I
The Interior-Point Method (Barrier Method):
Given a strictly feasible x ∈ F, c = c(0) (> 0), µ > 1 and tolerance > 0
repeat
1. Centering step:
Compute x∗ (c) by minimizing the modied barrier problem
x∗ (c) = argmin (c f (x) + φ(x))
s.t. Ax = b
4. Increase c:, c = µ c
L
<
c
I
25-Dec-2014
103 / 118
Choice of µ:
- µ provides a tade-o in the number of iterations for the inner and outer
loops.
- small µ: at each step inner loop starts from very good point, few inner loop
iterations are required, but too many outer loop iterations may be required.
- large µ: at each step c increases a large amount, the current starting point
may not be a good point for the inner loop. Too many inner loop iterations
may be required, but few outer loop iterations are required.
- In practice, small values of µ (i.e., near one) result in many outer iterations,
with just a few Newton steps for each outer iteration. For µ in a fairly large
range, from around 3 to 100 or so, the two eects nearly cancel, so the total
number of Newton steps remains approximately constant. This means that
the choice of µ is not particularly critical; values from around 10 to 20 or so
seem to work well.
Accuracy of centering:
- Computing x∗ (c) exactly is not necessary since the central path has no
signicance beyond the fact that it leads to a solution of the original problem
as c → ∞; inexact centering will still yield a sequence of points x(k) that
converges to an optimal point. Inexact centering, however, means that the
points λ∗ (c) and ν ∗ (c), computed from the rst two equation given in the
section titled "Dual points from central path", are not exactly dual feasible.
This can be corrected by adding a correction term to these formulae, which
yields a dual feasible point provided the computed x is near the central path,
i.e., x∗ (c).
- On the other hand, the cost of computing an extremely accurate minimizer
of c f (x) + φ(x), as compared to the cost of computing a good minimizer of
c f (x) + φ(x), is only marginally more, i.e., a few Newton steps at most. For
this reason it is not unreasonable to assume exact centering.
starting at x using the modied Newton's Method.
2. Update: x = x∗ (c)
3. Stopping criterion: uit if
25-Dec-2014
105 / 118
I
25-Dec-2014
104 / 118
Choice of c(0) :
- large c(0) : rst run of inner loop may require too many iterations.
- small c(0) : more outer-loop iterations are required.
L
- One reasonable choice is to choose c(0) so that c(0)
is approximately of the
same order as f (x(0) ) − p∗ , or µ times this amount. For example, if a dual
feasible point (λ, ν) is known, with duality gap η = f (x(0) )`(λ, ν), then we
can take c(0) = Lη . Thus, in the rst outer iteration we simply compute a
pair with the same duality gap as the initial primal and dual feasible points.
- Another possibility is to nd c(0) which minimizes
inf kc ∇f (x(0) ) + ∇φ(x(0) ) + AT νk2
ν
25-Dec-2014
106 / 118
25-Dec-2014
108 / 118
Example 25:
Solution:
25-Dec-2014
107 / 118
Example 26:
Solution:
25-Dec-2014
109 / 118
25-Dec-2014
110 / 118
How to start from a feasible point?
25-Dec-2014
111 / 118
I
Consider gi (x) ≤ 0, i = 1, 2, . . . , L and Ax = b.
I
We always assume that we are given a point x(0) ∈
Ax(0) = b.
I
If such a point is not knows, a preliminary stage, Phase I is run rst
I
L
T
i=1
dom gi (x) satisfying
Then, we form the following optimization problem
min s
s.t. gi (x) ≤ s,
i = 1, 2, . . . , L
Ax = b
I
The interior-point method requires a strictly feasible starting point x(0)
Basic Phase I Method:
I
I
25-Dec-2014
112 / 118
Then, apply the interior-point method to solve the above problem. There are three
cases depending on the optimal value p̄∗
1. If p̄∗ < 0, then a strictly feasible solution is reached. Moreover if (x, s) is
feasible with s < 0, then x satises gi (x) < 0. This means we do not need
to solve the optimization problem with high accuracy; we can terminate
when s < 0.
2. If p̄∗ > 0, then there is no feasible solution.
3. If p̄∗ = 0, and the minimum is attained at x∗ and s∗ = 0, then the set of
inequalities is feasible, but not strictly feasible. However, if p̄∗ = 0 and the
minimum is not attained, then the inequalities are infeasible.
in the variables x ∈ RN and s ∈ R.
s is a bound on the maximum infeasibility of the inequalities and it is to be driven
below zero.
The problem is always feasible when select s(0) ≥ max gi (x(0) ) together with the
i
given x(0) ∈
L
T
i=1
dom gi (x) with Ax(0) = b.
25-Dec-2014
113 / 118
25-Dec-2014
114 / 118
Example 27:
Solution:
25-Dec-2014
115 / 118
25-Dec-2014
117 / 118
25-Dec-2014
116 / 118
25-Dec-2014
118 / 118

ELE604/ELE704 Optimization - Hacettepe University, Department of

Transkript

Benzer belgeler

H.E Ambassador of As Hacettepe University, internationalization is

8 Ekim 2014, Çarşamba Saat : 15:00 Yer: Yaşar ATAMAN Seminer

Home_files/Serol Resume par

inheritance of female sterility in induced cicer species - Field