Monday, September 21, 2015

Integer Data Structures

This previous week in my advanced data structure independent study I covered integer data structures applied to the predecessor problem.

The predecessor problem asks for the predecessor or successor element for a given key, even if there is no value in the data structure with that key. This problem is actually fairly difficult if one considers that key/value mappings such as hash tables are no longer applicable. Likewise, many problems can be decomposed into special cases of the predecessor problem. 

"Integer" data structures actually don't involve properties of the mathematical definition of integers at all. They are a model of computation that is closer to the actual machine than a pointer machine. Where a pointer machine only has references, an integer data structure will typically exploit properties of pointers and the ability to encode information in machine-word-sized bit vectors. 

The Van Emde Boas structure and Fusion Trees are introduced in these three lectures, as is the trick of "indirection" for keeping algorithmic bounds low by routing to chunks of information. The proof of the running times of these structures is quite mind-expanding, as it treats the algorithm as a communication between the data structure and the reading party and uses information-theory to prove a lower bound on communication. 

Fusion trees are very interesting, using "sketching" and bit vectors to add parallelism to the tree search. This entire section has been a collection of algorithmic "tricks" that have a lot of power for optimization.

My notes for the lectures are in the blog post body after the break.

* Integers I: Integers & van Emede Boas, The Predecessor Problem
** Models = word ram and cell
** Predecessor Problem operations:
*** Maintain set S of n elements, which live in some universe u
*** insert (x element of u)
*** delete (x element of S)
*** predecessor (x element of u) = the largest value smaller than x 
*** successor (x element u)
** Models: For integer data structures
*** Unifying concept is a "word" of memory
*** A word is a w-bit integer
*** This defines a universe of unsigned integers, 0 up to ( 2 ** w )- 1
*** A transdichotomous ram
**** Memory is an array, you can do random access into this array of S words
**** Can do any operation that reads / writes a constant number of words in memory in O(1)
**** Words are pointers
**** W must be at least as large as your space bound S, otherwise you cannot access all memory
**** if S >= n, then w >= log(n), n your problem size.
*** Word ram is specific version of transdichotomous ram
**** Operations are restricted to C-like operations:
***** +, -, *, /, modulo
***** binary and, or, not, xor, shift left, shift right
***** [] - Random access / array deref
***** <, > - Comparisons
**** Upside: Do things in W-bits in parallel
**** Natural evolution of the comparison model
*** Cell-probe model
**** Just count the number of memory reads and writes required
**** Computation is free
**** Not a realistic model, is used for lower bounds
*** Model Strength Hierarchy (strongest to weakest)
**** Cell probe
**** Transdichotomous RAM
**** Word RAM
**** Pointer Machine
**** Binary Search Tree
** Predecessor Problem Solutions:
*** Van Emde Boas
**** By Peter Van Emde Boas
***** Time O(log w) = O(log log u) per operation, O(u) space
***** Can get log(W) with high probability and O(n) space by combining with hashing
*** y-fast trees:
**** O(log(w)) with high probability, O(n) space
*** Fusion trees
**** O(log_w (n)) with high probability, O(n) space
***** Sometimes better than y-fast, worse when w is small 
***** Crossover point when log_w(n) = log(w) = O(sqrt(log(N))), high probability in O(N) space
***** Therefore this is the minimum
***** Actually optimal. Wasn't known for many years.
*** Cell probe lower Bound for Predecessor
**** O (n poly log(N)) space => O( min {log_w(n), (log(W) / log(log (w) / loglog(n) ))})
**** vEB is optimal for w = O(polylog(n)) and fusion trees are optimal for w = 2 ** (theta(logn * loglog(N)))
** Van Ende Boas
*** Idea want recurrence: T(u) = T(sqrt(u)) + O(1) = O(loglog(u)) = O(log(w))
*** Split universe into sqrt(u) clusters, each of size sqrt(u).
*** Hierarchal Coordinates
**** Word x = <c, i> where c is which cluster you are in and i is the offset into the cluster
***** c = x // sqrt(u)
***** i = x mod sqrt(u)
**** Constant time decomposition and composition
**** Easier if we work in binary, x is a bunch of bits
***** First half of the bits are c, second half are i
***** Assumes c is a power of 2
***** c = x / (w/2)
***** i = x & ((1 << w/2) - 1)
*** Recursive vEB structure of size u(word size w)
**** We have a bunch of clusters, each of size sqrt(u)
**** Summary structure of size sqrt(u), states whether a cluster has any items in it
***** Cluster name is in summary if item is not empty
**** The value c indexes summary, i indexes each cluster
**** V.cluster [i] = vEB of size (sqrt(u)), word size (w/2)
***** Note: u and machine word size doesn't change but we can choose to take a smaller u/w for the sub-structure
**** V.summary = vEB of size (sqrt(u)), word size (w/2)
**** V.min = explicitly stored minimum
***** Stored separately, we do not store recursively, doesn't live in any other structures
**** V.max = copy of max, is stored recursively
*** Pseudocode for operations
**** Successor
Successor<v, x=<c, i>)
- if x < V.min: return V.min
- if i < V.cluster[c].max: return <c, Successor(v.cluster[c], i)>
- else let c' = Successor (V.summary, c) in return <c', V.cluster[c'].min>

***** We only recurse on a structure of size sqrt(u).
***** Therefore the running time is from the recurrence T(u) = T(sqrt(u)) + O(1)
**** Insert
Insert<v, x=<c, i>)
- if V.min = None then V.min = V.max = x ; return
- if x < V.min then swap x and V.min 
- if x > V.max then V.max = x
- if V.cluster[c].min = None then Insert (V.summary, c)
- Insert(v.cluster[c], i)
***** Looks bad since we recurse twice in the case that V.cluster[c].min = None
****** In that case the recursive call will hit the first case, which terminates immediately
****** Therefore running time still holds
**** Delete
Delete<v, x=<c, i>)
- If x = V.min then ( c = V.summary.min; 
-- if c = None then ( V.min = None ; return ) // now empty
-- else x = V.min = <c, V.cluster[c].min> // unstore new min
- Delete(v.cluster[c], i)
- if V.cluster[c].min = none then Delete(v.summary, c)
- if V.summary.min = None; V.max = V.min
- else let c' = V.summary.max in V.max = <c', V.cluster[c'].max>
*** History - Building up modern interpretations
**** This isn't how Van Emde Boas presented this, this is how it ended up in CLRS
***** Was the result of work to make cache-oblivious data structures more approachable
**** First appeared like so:
***** Relevant papers: (vEB, Kaas, Zijlstra) in 1977 and vEB in 1975
***** Bit vector presentation:
****** O(N) to walk all bits, so make a tree
****** Each node in tree stores the OR of children
****** The height of the tree is log(u) or w. We want log(w)
****** How? Binary search on the height. Definition: Simple tree view
******* If path to the root from a 0 contains a 1 at some point, that's the first point that we encounter a successor/pred
******* Once it's a 1, will continue to be a 1, since we are OR-ing up the tree
******* We can find this first 0->1 transition by binary search
****** Idea: Any leaf-to-root path is monotone
******* Binary search for 0->1 transition
******* At first 0->1 transition we have two trees.
******* If we started in left tree, successor is min of right tree
******* If we started in right tree, predecessor is max of left tree
******* O(1) time
******* If we find the predecessor, how do we find the successor if that's what we want and vice versa?
******* Make linked list of items, follow one pointer
******* Find both pred/succ at cost of finding either one
******* Hard to maintain this structure dynamically
******* Get O(loglog(u)) predecessor / successor
******* Relies on ability to binary search on any root to node path
******** Don't have enough pointers to do that
******** Can store sequentially in array, using arithmetic with the traditional heap array structure
******** Original vEB paper is in a pointer machine though
********* Called stratified trees
********* Every node stores a pointer to the (2**i)-th ancestor, 0 <= i <= log(w)
********** Gives every node the ability to move up by a power of two, which lets us do binary search
********* Is O(u log(w)) space
********* Every node stores the min and max of subtree, lets us move back down
********* This is the pointer machine model
********* Duality: Top half of tree is summary and each of bottom halves of tree is cluster.
********* Need at least u space to specify input. Can show loglog(u) is optimal for this model
********** Word ram model lets us reduce space to n
********* This is slow update, fast query
********* Actual stratified trees from paper don't update the min, makes it a mess
***** We have original, and we have a query fast, update slow structure
****** O(log(W)) query, O(W) update
****** Do less updates!
****** Make structure with larger pool of theta(n/w) items, and chunks of theta(w) items.
****** Use:
******* Search in pool for chunk
******* Update chunk
******* Only need to update pool if the chunk needs to be split/merged because of size going out of bounds
****** Update time goes down by a factor of W
******* Must pay chunk update cost
******* Use binary search tree for the chunks. Smaller than a vEB, BST has O(log(w)) time
******* Search through pool is log(w), update cost in chunk is log(w).
******* We can charge the O(w) update cost to pool to the O(w) updates that we'd need to do to chunk to make split.
******* O(log(w)) time
***** We have two ways to get O(loglog(u)) query and O(u) space, how can we reduce space?
****** Don't store empty structures!
****** A cluster/chunk can be entirely empty, storing them gets you O(u) space
****** Not storing empty clusters gets you O(n) space.
****** Gets annoying since we want to store array for fast access, but an array gives us O(sqrt(u)) number of chunks
****** Don't use array, use hash table with dynamic perfect hashing, theta(1) with high probability
****** O(n) space with amortization. Charge each table entry to min of cluster.
******* If someone exists in hash table, means that chunk is non-empty
******* Charge space in summary to min of chunk
******* We only charge to each element once, so we have O(n) space.
******* Doesn't appear in print, but adding hashing gets us O(N) space with hashing.
***** Another way to get bounds, x-fast trees
******* Don't store 0's in simple tree view.
******* Interpretation = path from root to node has 1's for right and 0's for left.
******* Store the paths to ones in simple tree's leaves as binary strings in a dynamic perfect hash table
******** Different hash table for strings of different length
******* Store all of the prefixes in hash tables
******** Lets you do binary search for 0 to 1 transition
******** Checking if a node is 0 or 1 is the same as checking for containment in the hash table of that prefix length.
******* Need min/max pointers for subtrees
******* Predecessor/successor in O(log(n))
******* Uses O(nw) space. O(w) updates.
***** y-fast trees
****** x-fast trees + indirection(chunks + "pool" structure from above)
****** O(log(w)) per operation with high probability(hashing)
****** O(n) space
* Integers II: Fusion Trees, when the word size is large
** Summary
*** Sketch and why it's enough
**** Reducing a word to fewer bits while still being useful
*** Approximate sketch via multiplication
*** Parallel comparison
*** Most significant set bit
** Fusion tree goals
*** O(log_w (n)) predecessor / successor.
*** Van Emde Boas = log(w).
*** Can take min of the two, value <= sqrt(log(n))
*** Static
*** O(N) space
*** Word ram model
** AC ** 0 RAM Version
*** Any constant-depth circuit model with unbounded fan in/out.
*** Forbids multiplication, since it requires a log(n)-deep circuit, rather than O(1) hardware
** Dynamic versions of fusion tree
*** O(log_w (n) + loglog(n)) deterministic updates
*** O(log_w (n)) expected time using hashing
** Static Fusion trees
*** Idea: B-tree with branching factor w ** (small constant, we use 1/5 )
*** Height = theta(log_w (N))
*** Problem with comparing item to keys in each node, since each key is w-bits long and w ** (1/5), so must read w ** (1 + 1/5) bits
**** Can only read w bits in constant time
*** Fusion tree node data structure requirements:
**** Store k = O(w ** 1/5) keys x_0 < x_1 < x_2 .... x_k-1
**** O(1) time for predecessor/successor
**** |keys| ** O(1) preprocessing
*** Distinguishing k = O(w ** 1/5) keys
**** Do we need all of those bits to distinguish those keys? No
**** We can store the key by the path to the value. Use bitstring with 0 for left, 1 for right
**** Idea: Look at branching nodes
***** Only care about bits for branching nodes and then way to get from last branching node
***** We have k-1 branching nodes in this height = w tree of k keys
***** At most k-1 levels containing a branching node
***** These levels correspond to "important bits"
****** b_0 < b_1 < b_2 .. < b_r-1 are our bit indicies
****** r < k = O(w ** (1/5))
****** Total number of important bits = w ** (2/5), so we can look at it in constant time.
*** Perfect sketch of a word x
**** Extract bits b0 to b_r-1 from x
**** r-bit string whose i-th bit = bit b_i of x
**** Computable in O(1) in an AC ** 0 operation
***** Search: Given query q
**** Compute sketch (q), or the path to follow in our tree to get to the result of q
**** Problematic because our fusion tree node key sketches don't take into account the fact that q will diverge from those key sketches at some point.
**** Desketchifying
***** Example:
****** Take a tree with 4 keys, branching happens at nodes 1 and 3.
****** Take a query with the interesting bits equal to another node
******* This node is neither necessarily successor nor predecessor because in theory, could have diverged very far up the tree, be very far away
******* Bad
***** Say sketch(x_i) <= sketch (q) < sketch (x+1)
***** Look at the longest common prefix between q and x_i and between q and x_i+1
***** The longest common prefix is the lowest common ancestor
***** Tells us that we went *away* from all of the data at one point
***** We can find the tree to the right/left and take max/min to get the successor or predecessor
***** Find the node y where q fell off of the data-containing path
****** If bit |y + 1| of q is 1, nearest x_i is in in subtree of bits up to y concatenated with "0"
******* Nearest predecessor extreme is node upto y + "0" + 1's until the leaves are reached
******* Works well because we follow q up to the end of the data, and then find the rightmost element in sub-tree.
****** If bit |y+1| is 0, e = y 1 000000
*** Search given a query q
**** Compute sketch (q)
**** Find it amond sketch(x_i)
**** find y = lower common ancestor between q and x_i or x_i+1
**** Compute the nearest extreme, e
**** Find sketch(e) among sketch(x_i)
**** The predecessors and successors of sketch (e) among sketch(xi's) = pred/successor among x_i's
***** If q is nearest to e, than pred/succ of e is pred/succ of q if q was included, stepping over q.
** Approximate sketch(x)
*** Hard part is in making the n consecutive bits near eachother
**** Not really necessary
*** We have w ** (2/5) bits in perfect sketching, but we can go up to w.
*** Idea: Can spread out important bits in a predictable pattern of length O(w ** 4/5)
*** Idea: Mask the important bits, discard others
**** x' = x AND sum of i from 0 to n-1 of (2 ** b_i)
**** Second term has 1's in important bit slots
*** Second idea: Multiplication
**** Idea: mask important bits, multiple by a magic m, and mask
**** x' * m = (sum of i from 0 to n-1 of (x_(b_i) 2 ** b_i)) * (sum of j from 0 to n-1 of (2 ** m_j))
***** Assume M only has r bits set, the number of important bits
***** Don't know where they are.
***** Later Proof: There exists an m that does what we want
**** x' * m = (sum of i from 0 to n-1 of (x_(b_i) 2 ** (b_i + m_j))) 
***** The term in the summation only exists when x_(b_i) is 1
***** Survive in multiple places because m_j shifts by various amounts
***** Make sure that all m_j values are unique so bits never hit each other.
***** Want the x_(b_i)'s to exist in a small window that's smaller than w
**** Proof:
***** Claim: For any b_i values, can choose m_j values such that:
****** The b_i + m_j are all distinct for all i and j
******* None collide, the sums can be replaced by ORs
****** b_0 + m_0 < b_1 + m_1 < ... < b_(r-1) + m_(r_r-1)
******* Only care about places where i = j
****** The span of the bits (b_(r-1) + m_(r-1) - b_0 - m_0) = O(r ** 4) = O(w ** (4/5))
***** Proof Steps:
****** Choose m'_i < (r ** 3) such that b_i + m'j are distinct mod (r ** 3)
******* Pick m'_0 up to m'_(t-1) by induction
******* Pick m_(t').
******** It must avoid m'_i - b_i + b_k mod r ** 3 for all i, j, k
********* For i there are t choices, can be previous values
********* for j, k there are r choices
******** Pick t * (r ** 2) < (r ** 3), like perfect hashing
******** Can take polynomial time
****** m_i = m'_i + (w - b_i + i * (r ** 3)) rounded down to multiple of (r ** 3)
******* Multiplying by i * (r ** 3) spreads them out
******* Need a (- b_i) term to make sure that it works with b_i + m_j
******** W avoids negative numbers with -bi
******** Rounding fixes out that b_i isn't a multiple of r ** 3
******** We want to be congruent to m'_i mod (r ** 3)
******* ex:
| r4 | = | r3 | r3 | r3 | r3 | 
******* All m_i fall in the last r ** 3 of the number line.
******* Our values are aligned in inverse order of subscripts for b_i m_i, i from r-1 to 0
****** Mask: AND with (sum from 0 to r-1 (2 ** (b_i + m_i))) and then shift right by b_0 + m_0
****** Mask, multiply to spread out, and mask out the rest of the garbage (the things left by multiplictation where i != j)
*** Thus we concatenate or fuse all of these sketches into w ** (4/5) bits
**** We have w ** 1/5 of them, so we have O(w) time.
** Parallel comparison
*** Idea: You can compare two integers by subtracting. Lets do k subtractions for the price of one
*** Sketch Definition
**** Sketch(node) = | 1 bit * sketch (x_0 | .... | 1 bit * sketch(x_k-1) |
**** Sketch(q) ** k = | 0 bit * sketch (q) | .... | 0 bit * sketch(q) | = sketch(q) * | 00000000 | ... | 000000|
**** Difference (sketch(node) - sketch(q)) =
***** For each leading bit, either one bit (the one that we put in front of each sketch(node) cell) gets borrowed if value is bigger than the other, or doesn't
***** Mask with AND (| 100000 | .. | 100000), and then shift by the number of bits in sketch to get all of the values anded with one together
***** If bit i 1, then sketch(g) <= sketch(x_i)
***** Since the x_i values are increasing, they will transition from large to small at one point
***** This is where our key fits
***** Is the problem of finding the most significant 1 bit, but we can do it here more simply
***** Since monotone, number of bits minus the number of ones is the same as counting leading "1" position
***** Multiply by (| 000001 | ..... | 000001 |)
****** Duplicates the leading one values by copying and shifting for each (shift width + 1) length bit vector
****** All leading ones will land on the most leading zero, so the number of carries is the number of ones.
*** This is a special case of the most Significant Set Bit
**** MSSB is the index of the first set bit.
**** Present in CS, in a lot of instruction sets, but not as fast as the algorithm presented
**** Needed during desketchifying
***** Finding divergence between (q XOR x_i) and (q XOR x_(i+1))
**** Algorithm
***** Split word into sqrt(w) clusters of sqrt(w) bits
****** Similar to van Emde Boas
****** ex: x = | 0101 | 0000 | 1000 | 1101 |
***** Need summary vector
****** 1) Identifying nonempty clusters
******* Check first bits
******** x and F=| 1000 | 1000 | 1000 | 1000 |
******* Clear first bits
******** X_rest = x XOR first bits
******* Take F - X_rest
******** Leading ones get borrowed by presence of something
********* Leading ones are non-empty
******* Mask to get leading of this
******* Xor this with F to get where things are nonempty
******* Or it with whether first bit set
******** Whether any are set
****** 2) Use sketch with the leading ones to get the results together
******** No approximate, not enough space
******** Can do perfect sketch since we know where the B_i's are
********* b_i = sqrt(w) - 1 + i * sqrt(w)
********* m_j = w - (sqrt(w) - 1) - j * sqrt(w) + j
****** 3) Find first nonempty cluster
******* Use parallel comparison with the powers of 2
******* The result of this is the power of 2 that it's greater than
******* This is the most significant set bit
****** 4) Find first 1 bit in cluster
******* Use trick from previous step
****** 5) Value is c * sqrt(w) + d
******* c index from step 3, d index from step 4
* Integers III: Predecessor Lower Bound via Round Elimination
** Summary
*** Will prove the lower bound
*** A communication complexity perspective of the DS
*** Relatively simple because of round elimination
*** Uses information theory for round elimination lemma
** History
*** First Lower bound is by Ajtai in 1988
**** For every word size, there exists a problem size n such that the algo is omega(sqrt(log(w)))
*** Milterson 1994
**** Took 1988 proof, but found essence better
**** Proves same bound, but found complementary bound
***** All n, exists a w such that the time is theta(cubed root (log(w)))
**** Works with communication complexity
*** Breakthrough in 1995
**** Round elimination re-proof of previous proof in simple way
*** Beame & Fich 1999
**** For all w, exists an n such that the time is omega ( (log(w)) / loglog(w) )
**** For all w, exists an n such that the time is omega ( sqrt( log(n) / loglog(n) ) )
**** Found static data structure achieving O(min of previous times)
*** Xiao 1992
**** Same lower bounds as 1999, but very complex still
**** Beame and Fich independently discovered it
*** Sen 2003
**** Round elimination proof of same result as 1992, 1999
**** Not as messy as other proofs
*** Patrascu & Thorup 2006/2007
**** Give tight n vs w vs space tradeoff
**** Assume space is n * (2 ** a)
**** Time is Theta of min of
***** log_w (n)
****** From fusion trees
****** Optimal for log(w) is Omgea (log(n) loglog(n)), for large W
***** log ((w - log (n)) / a)
****** Roughly vEB
****** Optimal for w = O(polylog(n)) statically, for small W
***** ( log (w/a) ) / log ( (a/log_n) * log (w/a) )
***** ( log (w/a) ) / log ( log(w/a) / log (log(n)/a) )
****** Optimal for W between optimal sizes for fusion and vEB
** Lower bound proof: Lowered bound for easier problem (colored predecessor problem)
*** Implies lower bound for harder problem
*** Colored Predecessor
**** Each element is red or blue
**** Query reports color of predecessor
***** Easier, could use lookup table + perfect hashing once we found the predecessor
** Communication complexity perspective of colored predecessor problem
*** Alice knowns X, Bob knows y, goal find f(x, y)
**** Alice can send messages with <= a bits
**** Alice can send messages with <= b bits
*** X might be much larger than a, y might be larger than b
*** Communication happens in "rounds" of back and forth
*** How many rounds does it take to compute f(x, y)? Depends on x
*** Cell-probe model
*** Alice is the query algorithm, x is the query, Bob is the memory, y is the data structure
*** Rounds are memory reads
*** Alice's a = log(S) if you have S space
*** b = w, since the response to a word read is a word
*** f (x, y) is colored predecessor
** Round Elimination
*** Definitions
**** f ** (k): Variation on a problem f
***** f is colored predecessor here
**** In f ** k, alice has K inputs x_1 to x_k
**** Bob has input y and integer i element of {1, k} and already knows x_1 to x_(i-1)
**** Bob's goal is to compute f(x, y)
**** Only based on x_i, want f (x_i, y). Alice should send x_i.
**** Problem: Alice doesn't know i, only Bob knows i.
***** Alice must ask Bob for i, is needless. One message can accumulate.
**** Can we eliminate first message and iterate?
**** If Alice sends *no* messages, we know that she can only win with 50% probability, since we know nothing
***** Contradiction if we prove we can do better.
*** Round Elimination Lemma:
**** For any function F,
**** If there exists a protocol for f ** k, where Alice speaks first
***** With error probability S
***** It uses m messages
**** then there exists a protocol where Bob speaks first
***** with a higher error probability <= S + O(sqrt(a/k))
***** it uses m-1 messages.
*** Intuition
**** If i were chosen uniformly at random then expect a/k bits to be "about" x_i.
***** Maybe Alice gets lucky and sends x_i. Probability of this is 1/k
***** Alice doesn't have to send an x_n, can send an XOR or OR
***** Can only send a bits
**** Bob guesses those bits by flipping coins, can remove that message by removing those
***** Probability of correct guess is 1 / (2 ** (a/k))
***** Probability of incorrect guess is 1 - (1 / (2 ** (a/k)))
****** Sqrt of this is the error increase when Bob guesses
****** Approx a/k
*** Claim: Time has lower bound(omega) of min {log_a (w), log_b (n)}
**** for any colered predecessor data strucuture that's static and randomized(in terms of queries and data)
**** Refinement: for polynomial space, a = O(polylog(n))
***** Theta = min { log(w) / loglog(N), log_w(n) }
**** Corollary (assuming a = O(log(N))
***** Lower bound largest when log_a (W) = log_b (N), or when terms in the min equal
****** = log_w / loglog(n) = log_n / log_w
****** = log ** 2 (w) = log(n) loglog(n)
****** = log (w) = sqrt ( log(n) loglog(n) )
***** Also gives us loglog(W) is roughly equal (dominating terms) to loglog(N)
****** Therefore log (W) / loglog(N) is Omega ( log(W) / loglog(w) )
******* This is one of our Beane & Fich bounds
****** We also have log(N) / log(W) = log(N) / sqrt (log(n)loglog(N)) = omega(sqrt (log(N) / loglog(N)))
******* Second Beane & Fich bound
**** Proof of claim
***** let t = # of cell probes (rounds) for a colored predecessor
***** Goal: Do t round eliminations
****** Lemma eliminates half a round, do 2 * t to eliminated both Bob's and Alice's
****** No messages
****** So the error >= 1/2
***** In colored predecessor problem, there is an asymmetric amount of information.
****** Eliminating an Alice to Bob message different than a Bob to Alice message
***** Proof:
****** Step 1) Alice->Bob Message Elimination
******* Alice's input x has w' bits left
******* Break into k = theta (a * (t ** 2)) equal-size chunks
******* Break into chunks, x_1 to x_k
******** Claim, minimizes the error increase term O(sqrt(a/k)) so the best k
******** A's cancel with A in k equation above, so O(sqrt(1/ (t ** 3))) = O(1/t)
******* We need an f ** k problem, must only care about 1 x_i that only Bob knows which i is interesting
******* Rephrased, only one interesting bit can differ in a mearningful way
******** Recall: vEB trees with monotonic paths, it's the one meaningful bit that we care about.
******* Consider our chunks part of a tree with branching factor theta(a * (t ** 2))
******** Bob knows this picture, Alice doesn't
******** Goal is to find the x_i branch such that the predecessor is a child without any branches
********* Alice's query doesn't give any new information because the it will likely give the interesting bits for a branching node above the x_i that matters.
******** Problem comes down to computing predecessor of the given node
******** We want the sub-tree with the predecessor
******** Since each sub-tree has one node, that's good enough
******* Round elimination reduces w' to theta (w' / (a * (t ** 2)))
******* Kind of like vEB
******** vEB is binary searching on the level that matters
******** Here, one level matters. The last one.
****** Step 2) Bob-Alice Message Elimination
******* Looks like fusion trees
******* Bob has input n' integers, each of them is w' bits = y
******* Break input into theta(b * (t ** 2)) equal-sized chunks
******* Error increase is O(1/t)
******* Split up all inputs with a binary tree
******** Each sub-tree is a predecessor problem, we index along the path x_1 x_2 ... x_k
******** We have theta O(b * (t ** 2)) nodes
******** Each child-tree has O(n / (b * (t ** 2))) things to look through
******** Alice knows what the leading bits are, Bob doesn't. Bob is speaking first.
********* First message from Bob is useless, can remove
******* Reduce n' -> theta(n' / (b * (t ** 2)))
******* Reduce w' -> 1 - log(b * (t ** 2)) >= w/2 if w' = omega (log(w))
******* At some point can't evently divide, this gives us our lower bound
****** We have to stop when w' hits log(w) or n' hits 2.
****** If we succeed with removing all messages, we get a contradiction
******* Set errors up to become 1/3, but we know that guessing at random must give us at best 1/2 to pick the color
****** Will not eliminate all messages when t is larger than a value, so we need t (which is the number of cell probes) to have a lower bound
******* t = omega (min { log_(a * (t ** 2)) (N), log_(b * (t ** 2)) (N)})
******* 
**** Claim: a * (t ** 2) has order O(a ** 3)
****** Because a is at least log(N),
******* a is at least log of space to store the data, N
****** t <= log(N)
******* Because we have a predecessor DS that runs in log(N) time, which is BBSTs
**** Claim: b * (t ** 2) has order O(b ** 3)
***** b = w by the definition
***** t = O(log(w))
****** From vEB
**** Therefore the time is the running time time we had to prove, QED

No comments:

Post a Comment