Monday, September 7, 2015

Higher-Dimension/Geometric and Kinetic Data Structures

While at ICFP this last week, I covered the second week of MIT's

My condensed notes have been typed up below. 

Orthogonal Range Searching

- Data = points
- Query = box in space
- ex: Which points are in box, are any points in the box, how many?
- Goal: O(log(n) + k) query in 2D, where n is the number of points and k is the size of the output

Range Trees

- 1D, Sort and use Binary Search
-- Use balanced binary search tree
-- The data is in the leaves

-- if a or b are not in the tree, when we query(a,b) we expect values between 
(a || pred(a)) and (b || successor(b)
-- There are only log(n) subtrees between endpoints, fast iteration
-- Time = O(log(n) + k), we see that k comes from the time to collect the data once it's found.

- 2D, Use subtree for x data. For each subtree, stores a reference to a 1D range tree for the y dimension.
-- Naive: For each subtree of range tree for X, store a different y subtree. Space increases rapidly.
-- Every point lives in O(log(n)) y trees. We have Theta(n log(n)) space, but efficient O(log**2(n)) search.
-- Query on x=O(log**2(n)) y subtrees.
- This generalizes to higher dimensions

Optimization - Layered Range Trees

- Reuse searches between the range trees for y, to save a logarithmic factor.
- Consider:
-- We have an array of all elements sorted by y
-- We only want one binary search in y, when we're at the one we encounter at the root.
-- Let us say that we keep the location of locations in arrays at the nodes of the trees.
-- Therefore, each x node points to an array of y nodes. We have an implicit tree.
-- A child of x' points to a completely different list of y's than x'.
- Idea: In current value in x's parent's y slot, store a reference to the predecessor to that value in x's y list. We can follow this link for a constant-time search in x's list for the value we've found in x's parent's list.
- Each x node traversal requires one constant time y step after the first (amortized log(n) for all y-trees)
- Saves a logarithmic factor in the last dimension only. O(log**(dimension-1)(n)) query time.

Dynamic Layered Range Trees / Weight-balanced trees
- Amortization tree argument: Cheap to update at the leaves, life is good if we mostly update at leaves.
- O(n log**(dim-1)(n)) space + O(create return data from end points)

BB[alpha] trees

-- for each node v: size(left(v)) >= alpha(size(v)) and size(right(v)) >= alpha(size(v))
-- if alpha = (1/2), the tree must be perfectly balanced. That's a bit much to ask. Choose a small alpha.
-- A weight balanced tree implies a height balanced tree
-- Height =< log_(1/n) (n)

- Updates
- When the node is not wight balanced(easy to se since we store subtree size), we rebuild the entire subtree from scratch to weight balance it.
- Charge the amortization to the theta(k) updates that unbalanced it. k = size of the subtree.
- O(log(n)) amortized update is rebuild is linear time

- For layered range trees: O(log**(d) (n)) amortized update, O(log**(d-1) (n)) query
- Faster than the naive implementation, supports dynamic updates.

Static orthogonal range searching. 

- Optimal space
- If you accept more space, O(log**(d-2)(n)) query
-- Conjectured to be optimal
-- Uses fractional cascading
- if d=1, above time doesn't hold. You can never beat a O(log(n)) query time.

Fractional Cascading:

- Warmup: Consider the problem of searching in k lists of size n for an element. You want to find the predecessor and successor in each list, not globally.
- Naive solution: k binary searches, O(k log(n))
- Our solution: O(k + log(n)).


-Add every other item in list_k in sorted order to list_k-1 to make list_k-1'
-- Tag the added elements as "forwarding"

- |L_i '| = | L_i | + 1/2 | L_(i+1) | = O(n)
- A search in L_i finds where you fit in L_i+1 as well.


- Do regular binary search in first list.
- Follow pointers in previous + next nodes that were promoted.
- Since you have every other node(1/2 factor) you have the key in either the previous node's pointee, the next node's, or the node between those two.
- To get ordering in first list(because things copied everywhere), you need more pointers
- Every item copies from one list to another has a pointer to the previous and next non-promoted items
- Every item in L_i has a pointer to the previous and next in (L_i)' - L_i. 

More general form:

- Do it on a graph rather than an array
- Each vertex of the graph has a list of elements
- Every edge (in a 1D ordered universe) has a range [a,b]
- Can only follow edge if range has what we query for
- Bounded in-degree, a requirement
- We want to promote some items to the objects referring to us
- Def: Locally bounded in-degree: the number of incoming edges whose labels contain x is constant.

Search(x): find (x) in k vertex sets

- Found by navigating graph from any vertex along edges
- After first search, query in neighboring trees is O(1) time
- The number of promoted items related in neighbors takes constant time
- # promoted items related to in-degree
- Size increase is log(n) factor

Geometry II

Orthogonal Range Searching with Fractional Cascading

- O(log**(d-2)) query
- Let's build it up

1) (-infinity, b_2) x (-infinity, b_3) <- y and z coordinates up to endpoints
O(search for b_3 in a list of z coordinates = log(n)) + O(k) time.
- Equivalent to stabbing vertical rays with horizontal ray
- Want to preprocess vertical lines to see if intersects with given y=c line.
- Make horizontal line at endpoints, called bricks that intersect with neighboring vertical lines.
- Can we extend the brick lines to make the collisions with each ray create a structure that we can apply fractional cascading to?

2) [a_1, b_1] x (-infinity, b_2) x (-infinity,  b_3] in O(log(n) * number of searches + k) with range trees
- Each node stores the structure in (1) on points in subtree.
- Always searching for the same (y, z) for each x,  so the look up is constant

3) [a_1, b_1] x [a_2, b_2] x (-infinity, b_3] <- Adds in an A_2 bound, the cost is a log(n) space increase.
- Kind of like a range tree on y coordinates
- node v stores key = max(left(v)) and the structure in (2) on points in right(v) and y-inverted (2) on points in left(v).
- y-inverted (2) is [a_1, b_1] x [a_2, infinity) x (-infinity,  b_3]

Query: Walk the tree
- if key(v) < a_2 < b_2 walk right
- if key(v) > a_2 > b_2 walk left
- if a_2 < key(v) < b_2
-- query (2) with [a_1, b_1] x (-infinity, b_2] x (-infinity, b_3]
-- query y-inverted (2) with [a_1, b_1] x [a_2, infinity) x (-infinity, b_3]

4) [a_1, b_1] x [a_2, b_2] x  [a_3, b_3] ditto on z using (3)
- One structure for a_3, one for b_3 (one is inverted tree)
- Total time is O(log(n) + k), the space is O(n log**3(n)).
- Really amazing, 3d orthagonal range queries in O(log(n) + k time). In general, it's O(log**(d-2)(n) + k) time.

Why can't we use this trick to get log(n) in all dimensions?
- Getting the one-sided interval is really the hard part. We'd starting having to do significantly more steps as it gets more complex.
- Space will explode, we gain a log(n) per dimension.
- log(n) for any dimension in general is impossible.

Kinetic Data Structures


-- Advance to time t
-- Change point x's trajectory
-- 1D example, 2d+ still open
-- Points on a straight line
-- Bounded degree algebraic type = a + bt + ct**2 + ..
-- Pseudo-algebraic: Any certificate of interest flips true/false at most O(1) times
-- Certificate: invariants that may become false due to the changing environment.


- Store true things now
- Store conditions that the data structure must maintain(certificates)
- Compute failure time for each certificate
- Put failure times into priority queue - O(log(n)) to find next failure
- Advance (t) removes failures before time(t)
-- For each failure, stop and fix.


- while t >= Q_min: {now = Q_min; event(Q.delete.min()) }
- event == fix ds / fix certificates
- Each even x_i has a reference to the certs that it fulfills
- Certs refer to their failure times
Time can be a proxy for other things, but a lot of people care about moving points.

Kinetic Predecessor + Successor:

- Balanced Binary Search Tree
- Tree of events
-- Too many to compare, comparisons on insertion would dominate
-- | event comparisons | == | event insertions |
-- Cute idea:
--- Only need to compare the rightmost element of the left subtree and rightmost element of left subtree
-- In general data stays sorted -> put certificates in tree
-- In-order traversal
-- Only O(n)
- failure = inf { t >= now | x_i(t) >= x_(i+1)(t) }
- The secret for performance is in the choice of certificates
-- event(x_i <= x_i+1) = event about to fail
-- swap(x_i and x_i+1) in BST of points.
-- x_i and x_(i+1) are equal at the time we change so this stays valid
-- We can update certs in O(1)

- Efficiency:
- Quadratic number of updates in worst case: example is a number of points passing a number of fixed points on a line. O(number of certs)
- This is the best that you can do.
- If we need to know the sorted order of points, we need to fix DS every time the order of the certs change.

Kinetic Heaps:

- The general solution isn't optimal in all cases
- Example: find-min and delete-min.
- In our previous example, the min only changes once.
- Store in min heap
- O(N) certificates still, but easier to maintain than before
- How do we do an event? event(x <= y)
-- O(1) to change the events once found
-- O(log(n)) to find.

No comments:

Post a Comment