## Thursday, October 22, 2015

### Dynamic Graph Connectivity

The dynamic graph portion of MIT's advanced data structure course is focused on the problem of maintaining a structure that allows for querying connectivity between any two nodes. The "dynamic" part of the title stems from the problem of dynamic optimality that is addressed earlier in the course. Querying the structure causes the structure to rebalance itself such that subsequent queries will be faster. It's like an in-structure cache.

```* Dynamic graphs I
** Intro
*** Preferred paths (From dynamic optimality)
*** Heavy-light decomposition
**** Really useful technique
*** Maintain a forest of trees subject to O(log(n))-time operations:
**** makeTree: returns new vertex in new tree
**** link(v, w): make v new child of w
**** Cut(v): Delete the edge from v to it's parent
**** findRoot(v): return root of the tree containing v
**** path_aggregate(v): compute sum/min/max/etc of node/edge weights on a v-to-root path
***** log(N) time!
***** Trees probably not balanced
****** A path_aggregate tree traversal could have a spine of length N, but we can still get log(n) time!
***** We store the unbalanced tree that we want to represent as a balanced tree
*** Version we will cover only works because of splay tree properties
*** Preferred-path decomposition
**** Preferred child of node v =
***** None if last access in v's subtree is v
***** w if last access in v's subtree is in child w's subtree
**** Preferred paths are taken by repeatedly following preferred child pointers
**** We partition the nodes in the represented tree into paths
***** Every node belongs to some path
**** Store paths in splay trees
***** Not all, must store the edges that connect the paths together.
*** Auxiliary trees (also like Tango trees)
**** Represent each preferred path by a splay tree keyed on the depth
***** Couldn't do with tango trees because needed it to be a binary search tree, were given keys
***** Here we don't have "obvious" choice of keys, so we use depth.
**** Amortized O(log(N)) per operation
**** Root of each auxiliary tree stores the path parent
***** Path parent = path's top node's parent in the represented tree
***** Store at root of splay tree for easier access, rather than at the leaf that represents the path parent
**** Note:
***** What we want to represent is always "represented tree"
***** What we're representing is the auxiliary tree
**** Auxilary tree + path parent pointers = tree of auxiliary trees
***** goal: balanced
*** access(v):
**** Make root-to-v path preferred and make v the root of it's aux tree
***** This means that v is the root of tree of aux trees.
**** Where tango trees walked top to bottom and then fixed up metadata structures, we walk bottom to top
***** We "push the value" up the tree.
**** Steps:
***** Splay node v within it's aux tree
****** Now V is at the root and our left/right halves are ordered by depth around V
****** Problem: V now has a preferred child left in the structure, but logically that isn't true
***** Remove v's preferred child:
****** v.right.pathparent = v
****** v.right.parent = none
****** v.right = none
****** Now we need to *add* new preferred child pointers
***** Until v.pathparent = none: (ie until we're at the root of the aux. tree)
****** w = v.pathparent
****** splay w within aux tree
****** switch w's preferred child to v:
******* w.right.pathparent = v
******* w.right.parent = none
******* w.right = v
******* v.parent = w
******* v.pathparent = none
****** splay v within aux tree
******* Means rotate V essentially
*******
****** v.pathparent = w.pathparent
**** Note: When we splay/rotate we need to carry along the path parent pointer
**** v now has no right child
***** It's the deepest node on the preferred path because it has no preferred parent
*** findroot(v):
**** Steps:
***** access(v)
***** walk left to find root r
***** splay r so fast later
*** path_aggregate(v):
**** Steps:
***** access(v)
***** return v.subtree.(aggregate)
*** cut(v):
**** Steps:
***** access(v)
***** v.left.parent = none
***** v.right = none
**** Can visualize this as moving V up to the root of the tree and then "popping" it out.
**** Precondition: v and w in different aux trees
**** Steps:
***** access(v)
****** Since the access of a root node is very short, v's aux tree is just v now
***** access(w)
***** v.left = w
***** w.parent = v
****** Could say w.right = v, similar analysis
**** V is now the deepest node in w's preferred path
**** This looks like it shouldn't work due to unbalancing, but it works because splay trees will make balanced
*** O((log ** 2)(n)) amortized bound:
**** Most things O(1 + access time)
**** Claim: We can use splay analysis here
**** O(log(n)) amortized time for splay trees which implies:
***** m operations cost O(log(n)) * (m + total number of preferred child changes)
***** claim: O(m log(n))
****** To prove, we need heavy-light decomposition
*** Heavy-light decomposition
**** Another way to decompose a tree into paths
**** We use it for the represented tree, not the aux trees
**** size(v) = # of nodes in v's subtree
**** call edge (v, parent(v))
***** heavy if size(v) > 1/2 * size(parent(v))
***** light otherwise
**** At most one heavy child/edge from each node
***** From the fact that it must be larger than all other children summed to be heavy
***** Therefore heavy edges decompose world into paths
***** Therefore heavy paths decompose tree / nodes
**** Seems silly, what does this do? Consider trivial case, with all light edges
***** Then size of v is ast most 1/2 size of parent
***** If all edges light, then this becomes binary search with O(log(n)) recursions
**** light-depth(v) = number of light edges on a root-to-v path
***** It is at most log(n)
**** Is an analysis technique, we're still using preferred paths
*** Proof: O(m log(N)) preferred child changes
**** The number of changes is
***** <= number of light preferred eadge creations + number of heavy preferred eadge destructions + n-1 edges
**** Proof: access(v)
***** Makes preferred edge path along root-to-v path
****** At most log(n) of them can be light
***** Each heavy preferred edge destroyed => light preferred edge created
****** Every time we delete an edge, we must create a light edge.
****** Has light preffered edge creation running time, except former preferred child of v
****** So log(n) + 1 upper bound, O(log(N)) total
**** Proof: cut(v)
***** We create light preferred edges created, so O(log(n))
*** Proof: O(log(N)) amortized bound
**** W(V)
***** = the number of nodes in v's subtree in tree of auxiliary trees
***** = sum (for all w in v's subtree in v's auxtree) of (1 + size(number of aux trees hanging off))
****** hanging off via parent pointers
****** Becomes adding up number of nodes
****** This comes from splay tree analysis
**** potential phi = sum_v (W(v))
**** Access lemma:
***** Amortized cost of splay(v) <= 3 (log(w(root of w's aux tree)) - log(W(v)) + 1
***** https://en.wikipedia.org/wiki/Splay_tree#Analysis
***** Used to prove things about splay tools, is an analysis tool
***** Is independent of W
***** Proof sketch (time constraints):
****** Analyze each case seperately, argue that you end up with a telescoping sum so it doesn't matter.
**** Assuming the access lemma
***** amortized cost of access(v) = O(log(N)) + O(number of preferred child changes)
***** We get a telescoping sum again
***** We get O(log(n)) amortized
* Dynamic graphs II
** Intro
*** Fully vs partially dynamic
*** euler-tour trees
*** O(1) decremental connectivity in trees
*** O((log ** 2)(n)) fully dynamic connectivity
*** survey
** Dynamic connectivity
*** Maintain an undirected graph subject to edge insertion / deletion
*** Insert / delete edges or vertices with no edges
*** connectivity(v, w): is there a v->w path?
**** Could also ask: is the graph connected
***** Are same problem
*** Fully dynamic: can do insert and delete
*** Partially dynamic: can do insert or delete
**** Insertion only is incremental
**** Deletion only is decremental
** Dynamic connectivity results
*** Trees: O(log(n)) in general
**** You can do O(1) decremental
*** Plane graphs
**** Graphs with a planar embedding, vertices are on fixed faces
**** O(log(n))
*** General graphs, amortized
**** open: O(log(N)) update and query
**** O((log ** 2)) update with O(log (n) / loglog(N)) query
**** Also possible to get O(log(n) (loglog(n)) ** 3) update, O(log(n) / logloglog(n)) query
**** Incremental
***** O(alpha(m, n)) via union-find
**** Decremental
***** O(m log(N) + n polylog(n) + #queries) total
***** For dense graphs we have log(n) per operation
**** Worst case bounds:
***** Open: polylog update and query
***** Current: O(sqrt(n)) update, O(1) query in general
***** incremental
****** theta(x) updates imply theta(log(n) / log(x)) queries
******* STOC 1999, Alstrup
**** Lower bounds:
***** You need omega(log(n)) update *or* query for dynamic connectivity
****** At least one must be log
***** O(x log(n)) update => omega (log(n) / log(x)) query
***** O(x log(n)) query => omega (log(n) / log(x)) update
***** Patrascu & Demaine STOC 2004
***** Open problem:
****** Can you get o(log(n)) update and polylog query
***** Since we're storing paths in tree, we need log(n)
** Euler-tour Trees
*** Simpler dynamic trees, simpler than link-cut
*** Lets you work with subtrees of tree rather than paths
**** Gives you support for stronger aggregates
*** Euler tour:
**** Euler tour is basically a DFS over the tree
**** We visit each edge exactly twice, node visitation count based on height
*** Idea: Store node visits by euler tour in balanced BST
*** Each node stores pointers to first and last visits
**** Clarification: node in tree points to point in euler tour trees
*** findroot(v):
**** Take any visit
**** Walk up the tree to root
**** Walk down the tree on left spine to min of BST
**** This is the first visit of the root node
*** cut(v):
**** We need to isolate the subtree of v
**** The euler tour visits V, then subtree, then visits V again
**** Therefore V's subtree is a contigous interval, bounded by V on either end
**** Steps:
***** split BST at first and last visits to V
***** concatenate the intervals before and after the area we cut out
**** We want v to be a subtree of w
**** Don't care about child order, let's put at the end
**** Steps
***** Split w's BST before w's last visit
***** Concatentate:
****** Everything before the right endpoint of W's subtree interval(w's last visit)
****** w
****** v's BST
****** Everything after the last w
***** Basically a copy-paste into the interval
**** But can't do path aggregates
**** Can do subtree aggregates
***** Just a range query in BST between first and last visit
***** log(n) time
*** O(log(n)) time per operation
*** Insertion has the problem of where to choose as the root, can pick arbitrarily
**** Note: Changing root is just a cyclic shift of the interval of the euler tour
** O(1) decremental connectivity in trees
*** Bound: O(1) amortized time assuming all n-1 edges get deleted eventually
**** O(n) time, so we assume n operations
*** 1) O(log(n)) via link-cut or euler tour trees
*** 2) leaf trimming: cut below maximally deep nodes with > log(n) descendents
**** Top tree has O(n / log(n)) leaves / branching nodes
**** Use 1) on compressed top tree
**** Connectivity queries must handle moving results between leaf and top trees
**** Here we use the trick of indirection that we've used elsewhere in the class, see vEB lecture notes for this "indirection" technique
*** 3) Bottom tree in O(1)
**** Store bit vector of which edges don't exist
***** Deleting an edge only requires marking this edge as deleted
**** Preprocess mask of each nodes ancestor
***** Fits in a word
**** We'll xor the v and w masks, and check whether 0
**** If 0, then
*** 4) Path in O(1) amortized
**** Split the path into n / log(n) chunks, each of length log(n)
**** We can use a word-sized bit vector for each chunk
**** Two options:
***** Use 1) on n/log(n) chunk summaries
***** Query by masking and checking whether all sub-ranges between elements are 0
****** If 1 then edge deleted, subtree not connected
** General graphs: O(log ** 2 (n)) dynamic connectivity JACM 2001
*** Idea:
**** Spanning forest = spanning tree for every connected component
**** Hierarchally divide connected components
**** O(log(n)) levels of spanning forests
*** When we cut, don't know if we've removed element from spanning tree or if has other link from tree
*** Level of edge starts at log(n), only decreases toward 0
**** This is our charging mechanism
*** G_i = subgraph of edges level <= i
*** G_(log(n)) = G
*** Invariant 1: Every connected component of G_i has <= 2 ** i vertices
*** F_i is the spanning forest of G_i
**** Store using euler-tour tree data structure
**** F_log(n) is the desired spanning forest
*** Invariant 2: F_n is a subset of F_(n+1)
**** F_i = F_log(n) union G_i
**** This makes F_log(n) a minimum spanning forest
***** With weight of an edge as it's level
*** Insert(e = (v, w))
**** Add e to v&w's incidence lists
**** e.level = log(n)
**** if v and w are disconnected in F_(log(n)):
***** insert e in F_log(n)
****** euler tour tree insertion, cheap, reroot + join
***** deletion is the costly step, but we charge to insertions
*** connectivity(v, w)
**** Make F_log(n) = B-trees of branching factor theta(log(n))
***** We change the size of the non-chunk tree in our indirection model
**** findroot in theta(log(n) / loglog(n))
**** Update in O((log ** 2) (n) / loglog(n))
*** delete(e = (v, w))
**** Remove e from v & w's incidence lists
**** if e is not F_log(n), not in entire forest, done
**** if e is in F_log(n):
***** delete e from F_e.level to F_log(n)
****** Use cut which is log(n), so log ** 2(n)
***** Since a minimum spanning forest, e is the lowest level
***** Look for replacement edge to reconnect v & w
****** replacement edge level >= e.level
***** For i=e.level to log(n)
****** Let T_v & T_w be trees of F_i with v & w respectively
****** Relabel such that |t_v| <= |t_w|
****** Invariant 1 implies |T_v| + |T_w| <= 2 ** i
******* |T_v| <= 2 ** (i-1)
******* Can afford to push all of T_v's edges to level i-1
****** for each level-i edge e'=(x, y) with x an element of T_v:
******* if y is an element of T_w then add e' to F_i, F_(i+1), .... F_log(n)
******** Return, we've found replacement
******* else y is an element of t_v, set e'.level = i-1 and add e' to F_(i-1)
******** This is what we charge the cost to
**** Overall time: O((log ** 2)(n) + num level decreases aka num charges * log(n))
***** Each inserted edge is charged <= log(n) times
**** Need to augment euler-tour trees
***** In a forest F_i at level i, consider subtree of v element
***** Is there any node with a level i edge incident to it? Are there any level i edges?
****** If no, just skip entire subtree
***** We augment the trees to store which levels are contained in the subtrees
***** We can find the next level-i edge incident to an element of the tree in O(log(n)) time
** Other problems
*** k-connectivity
**** Are there k disjoint paths from k to w?
**** 2-edge: O(log ** 4 (n)), 2-vertex = O(log ** 5(n))
**** Open: polyloglog(n) solution for k=O(1)
*** Transitive closure
1. 