Algorithms and Complexity – DP Hitting set

Given:
A={a_1,a_2,\dots,a_n}
B_1, B_2,\dots, B_m that all contain elements from A means subset of A, |B_i| \leq c
Subset H\subseteq A is a hitting set if it contains an element from all B_i's
Size k = |H|

Solve:

Branching-HS(A, (B1, …,Bm), k)
if k = o: 		//base case
	if Bi = original Bi for some i:
		return “no HS”
	else return empty set
else:
	choose some Bi
	for u ∈ Bi
		Brahching-HS(A-u, {Bi-u | i=1, …, m}, k-1)
	if all recursive
		return “no HS”, then return “no HS”
	else:
            choose some u where u ∈ Bi worked,
            add u to the result from that, call and return.

Running Time:

  • {Bi-u | i=1 to m} takes O(m)
  • T(n, m, k) = c \times T(n-1, m, k-1) + O(cm)
    • there are c amout of u in B_i, so this runs in c times
    • Subset has c elements, A have n elements, c\leq n so O(cm)=O(nm)

Rerolling above gives overall running time O(c^knm).

Algorithms and Complexity – DP No.1

Given:
Set of n intervals[s_i,f_i], each having weight w_i

Want:

Maximum weight subset of compatible intervals

Means:

we have:

  •  n  requests labelled 1,2,\dots,n
  • with each i request  specifying
    • a start time s_i
    • a finish time f_i
    • a weight w_i
    • Two intervals are compatible if they do not overlap
    • The goal is to find a subset S\{1,\dots,n\} that maximize the sum of the weights of selected intervals.

Solve:

Assume all requests are sorted in order of finish time: f_1\leq f_2 \leq f_3 \leq \dots \leq f_n

  • We say a request i comes before a request j if i<j
  • define p(j), for an interval j, to be the largest index i<j such that intervals i and j are disjoint
  • means i is the leftmost interval that ends before j begins, we define p(j)=0 if no request i<j is disjoint from j.

Prove:
Let OPT be an optimal solution, ignoring for now we have no idea what OPT is.
Claim:
Either interval n (the last one) belongs to OPT, or it doesn’t.

  • Claim 1: n belongs to OPT, then no interval indexed strictly between p(n) and n can belong to OPT, and OPT must include an optimal solution to the problem consisting of requests \{1,\dots ,p(n)\}.
    • Thus OPT-n is optimal for sub-instance \{1,\dots ,p(n)\} of intervals that finish \leq s_n
  • Claim 2: n does NOT belong to OPT, then OPT is equal to the optimal solution to the problem consisting of requests \{1,…,n-1\}.
    • Thus OPT is optimal for sub-instance \{1,…,n-1\}

Now computes:

MWIS(n)
if n=0
  return 0 //optimum over empty set
else
  return max{ MWIS(n-1), MWIS(p(n))+w(n) }

Just like Fibonacci numbers, it runs in T(n)=T(n-1)+T(n-2)+O(1)
Reusing computation:

MWIS(n)
T  //array 1,...,n
T[0]=0
 for i = 1,…,n
T[i]= max{T[i],T[p(i)]+w(i)}
 return T[n]

Algorithm runs in O(n) time
Took O(n log n ) to sort intervals and to compute p(i)'s.

How to find optimal solution:

call MWIS(n) to compute array T
S   //empty set
i   //n
while i > 0
  if T[i]=T[p(i)]+w(i)
    add i to S
    i←p(n)
  else
    i←i-1

Note: array T contains the value of the optimal solution on the full instance.
Note: each state of DP takes O(1) time, the algorithm runs n iterations → O(n)

Wallpaper-2009-08-QIAQIA-1680x1050

Wallpapers in the past

还是整理一下我这乱七八糟的各种的桌面图纸好了。。
毕竟当年也是花了不少时间画出来的。
WP1        WP2        WP3
详情见内,大图几十张=.= 我以前可真是闲着蛋疼啊。。
Read more…

Algorithms and Complexity No.4

WIS problem:
Given:
Suppose we are to schedule print jobs on a printer. Each job j has a weight w_j>0 (representing how important the job is) and a processing time t_j (representing how long the job takes). A schedule is an ordering \sigma=\langle j_1,j_2,\dots,j_n \rangle of the jobs.

For a given schedule \sigma, let F(\sigma,j) be the finishing time of job j in the schedule.

Want:
Design a greedy algorithm that computes a schedule with minimum weighted average finishing time that minimizing \sum_j w_jF(\sigma,j).

Solve:
Our greedy algorithm can be sorting the jobs in increasing t_j/w_j order.

Assume for simplicity that no 2 jobs have the same ratio. To prove that the schedule is OPT, we resort to the usual exchange argument.
Suppose the OPT is \sigma=\langle j_1,j_2,\dots,j_n \rangle and there are 2 adjacent indices j_i and j_{i+1} such that:

 \frac{t_{j_i}}{w_{j_i}}>\frac{t_{j_{i+1}}}{w_{j_{i+1}}}

Now, consider:

\sigma'=\langle j_1,j_2,\dots,j_{i-1},j_{i+1},j_{i},j_{i+2},\dots,j_n \rangle,

where we swap the position of j_{i} and j_{i+1}, and leave other jobs in their places. We now need to prove the change decreases the cost of the schedule.
Notice that the finishing time of jobs other than j_{i} and j_{i+1} is the same in \sigma and \sigma '. Thus:

\sum_j w_jF(\sigma,j)-\sum_j w_jF(\sigma ',j)
=
w_{j_{i}}F(\sigma,j_i)+w_{j_{i+1}}F(\sigma,j_{i+1})-w_{j_{i}}F(\sigma ',j_i)+w_{j_{i+1}}F(\sigma ',j_{i+1})

Let X=F(\sigma,j_{i-1}). Then:

F(\sigma,j_{i}) = X+t_{j_{i}},
F(\sigma,j_{i+1})=X+t_{j_{i}}+t_{j_{i+1}}
F(\sigma ',j_{i+1})=X+t_{j_{i+1}}
F(\sigma ',j_{i})=X+t_{j_{i}}+t_{j_{i+1}}

Plug those values into expression from above for the difference in cost between \sigma and \sigma '. After simplifying we get:

\sum_j w_jF(\sigma,j)-\sum_j w_jF(\sigma ',j)=-w_{j_{i}}t_{j_{i+1}}+w_{j_{i+1}}t_{j_{i}}.

Noting that

 \frac{t_{j_i}}{w_{j_i}}>\frac{t_{j_{i+1}}}{w_{j_{i+1}}}

implies -w_{j_{i}}t_{j_{i+1}}+w_{j_{i+1}}t_{j_{i}}>0. so we get:\sum_j w_jF(\sigma,j)>\sum_j w_jF(\sigma ',j).

Algorithms and Complexity No.3

Critical edges:

Given:
G=(V,E) be a connected undirected graph. We say an edge (u,v)\in E is critical if removing it disconnects the graph.

Want:
Given an algorithm for identifying all critical edges.

Solve:
Fun DFS on G to get a single tree T.

1. non-tree edges cannot be critical

This is because the tree keeps the graph connected even if all non-tree edges are removed. This reduced the search space for critical edges from m to just n-1.

Let D_u be the set of descendants of u in T, consider u to be a descendant of itself, so u\in D_u. Using the discovery times of DFS define for every vertex u:
 \text{low}[u] = min \left\{ \begin{array}{cc} \text{d}[w] & | (x,w)\text{ is a non-tree edge and }x\in D_u\\ \text{d}[u] \end{array} \right.
Recall DFS tree of an undirected graph, all non-tree edges connect 2 nodes such that one is the descendant of the other. So \text{low}[u] captures how high up the tree we can go from u by taking zero or more edges down the tree followed by a single non-tree edge to reach a node w up in the tree.

In order to keep \text{low}[u] well defined (in case that no non-tree edge exists), we also consider the option of staying at u. The discovery time of the vertices decreases as we go up the tree we take the minimum of the discovery times of the vertices before mentioned.

We now claim that a tree edge (u,v), where v is u‘s parent, is a critical edge if and only if \text{low}[u]=\text{d}[u]. The only way to remain connected is through a non-tree edge (x,w) where x\in D_u and w\notin D_u.

In that case, we know that \text{d}[u]>\text{d}[w]\geq \text{low}[u]. Otherwise, if no such edge exists, we have \text{d}[u]=\text{low}[u] and removing (u,v) disconnects the graph.

We could identify all critical vertices in O(n) time if we had these low values.

To pre-compute
\text{earliest-non-tree}[u]=min\{\text{d}[w]|(u,w)\text{ is a non-tree edge}\},

for each vertex u in O(m) time. Then we can compute low by using the following relation:
\text{low}[u]=\min_{x\in D_u}\text{earliest-non-tree}[u],
which would take O(n) for each u.

Thus computing \text{low}[u] for all u \in V would take O(n^2) time.

NOW, to get a more efficient algorithm we need to re-use come computation. This can be done if we re-difine \text{low} recursively:
\text{low}[u]=\min_{x\in D_u}\text{earliest-non-tree}[u], \min_{v:child of u}\text{low}[u]\}.

The two definitions are clearly equivalent, but the second one leads immediately to a O(n) algorithm for computing \text{low}[u] for all u \in V, provided \text{earliest-non-tree}[v] is given for all v \in V.

So this approach takes O(m+n) time.