Tuesday, September 29, 2015

Integer Data Structures + Hashing Notes

The last week I covered the fourth integer data structure lecture and the hashing lecture. The hashing lecture mostly pertained to hash tables, with a focus on collision management and hash suitability.


* Integer Data Structures IV: Integer Sorting and Priority Queues
** Intro
*** Priority queues equivalent to sorting
**** O(f(n, w)) priority queue gives us the sorting algorithm trivially, O(n * f(n, w)) sorting with n removals
**** The inverse is also true, a sorting algorithm in this time gives us a priority queue in this time
*** Problem: Can we get linear-time sorting here?
*** Open problem: Can we get O(1) decrease-key?
** Integer Sorting bounds
*** We have n w-bit integers
*** Comparison sorting: O(n log n)
*** Counting sort O(n+u) = O(n) if w = log(n)
*** Radix sort: O(n * (w / log(n)))
**** Counting sort for each digit in the radix
**** Number of chunks is w / log(N)
**** Linear is w = O(log(n))
*** Priority queue is weaker than predecessor, so we can use vEB
**** O(n log(w)) time
**** Can achieve O(n * log(w / log(n))) with more care
*** Signature sort
**** O(n) if w is Omega(log ** (2 + epsilon) (n)) forall epsilon > 0
**** Word size must be large compared to n
**** Could use vEB to get O(nloglog(n)) for all w
**** Han got O(nloglog(n)) with signature sort in O(n loglog(n)) forall w, deterministically in AC
***** Got rid of randomization used in vEB
**** Han & Thorup get O(n * sqrt (log(w/log(n)))) expected
***** Randomized, not deterministics
***** Is O(n sqrt(loglog(n))) forall w, since we can use above for large w.
**** Open: Working with small w. Can we get O(N) forall w?
*** Packed sorting
*** Bitonic Sort, uses a sorting network
** Signature sort
*** Assume w >= log ** (2 + epsilon) (n) * loglog(n)
*** Steps
**** 1) Break each integer into ( log ** (epsilon) (n) ) equal-sized chunks.
***** Each chunk is at least log ** 2 (n) because of division from above w limit
**** 2) Replace each chunk by O(log(n))-bit hash
***** Num is in chunks of digits. Think of digits.
***** We have huge range for each digit.
****** 2 ** (log ** 2(n))) possible values
***** Insight: Most digits don't appear
****** We have not that many chunks compared to the value range of each chunk, it's the number of chunks that limits our real range of values in the problem domain.
***** Let's hash them! This makes the universe of values smaller
****** Reduces to a polynomial universe
***** Note:
****** Want linear time, so we have to compute the hash value of each in constant time
******* Use multiplication method. Multiply by single value m, take modulus hash width.
******* Use m large enough to avoid collisions
****** We do not preserve order. We've mucked that up with hashing.
******* Each digit is mangled, out of order. The sequence of digits is preserved though.
**** 3) Use packed sorting in O(n) time.
***** n b-bit integers with w = Omega(b log(n) * loglog(n))
****** Provided b is small
***** Applies here because each chunk is log(n)-bits.
***** b = theta (log(1+epislon) (n)) and w >= (log ** (2+e) (n)) * loglog(n)
***** Wait! This doesn't give us the actual values sorted, only the hashes
**** 4) Build a compressed trie of sorted signatures
***** Aside: tries introduced later
****** trie is a tree where children of tree come from fixed universe
****** Comes from re-"trie"-ve
****** Will be like a B-tree since hashes large, we have huge branching factor
***** Why?
****** We've messed up the digits via packed sorting
****** This is equivalent to permuting the children in the interior nodes of the trie representation
****** What if we had a compressed trie representation of origional values?
******* We could do in-order traversal to find sorted order of leaves, could find sorted order of integers
******* But we don't have that trie, we have the trie where children permuted in interior nodes
******* If we put children in right order, we'd have the correct compressed trie
***** Compressed trie
****** Contract every non-branching node into parent
****** O(n) size in number of leaves
****** Same order as trie, order of leaves is preserved
***** Can compute compressed trie in O(n) time
****** Amortization argument
****** Given leaves in sorted order, want trie above the leaves
****** Compressed trie for speed.
****** Steps: For x_1, x_2, .... x_n in order
******* Insert into compressed trie starting with empty trie
******* At time i, I've computed the left part of the trie. Need to add x_i
******* x_i must come off of rightmost node of tree at some height
******* If node not contracted, add x_i on right in front of predecessor
******* If node contracted, a node suddenly is branching but wasn't. We can make a new node and add x_i to it.
******** This is constant amortized time
********* Start at rightmost leaf and walk up tree to find right point, O(n) time
********* O(N) since we can use XOR to find most significant bit between x_i and x_(i-1) to find where differ
********* if K edges, can do in O(K+1)
********* Alright because amortization measure function is length of rightmost path.
********* We go up, so k gets smaller. We charge to length of rightmost path.
********* O(n) amortized, since basically in-order traversal
***** Trie is correct after, but order of children
**** 5) Recursively sort all edges by (node.ID, actualChunk, edgeIndex)
***** Actual chunk = first chunk, actual chunk not hash
***** Index of edge
***** For each node, give me the sorted order of the actual chunks followed by the order they used to be in.
****** Gives you inverse of permutation
***** Size
****** Node.id is O(log(n))
****** Chunk is O(w / (log ** (epsilon) (n)))
****** Edge index is log(n) bits
******* Don't store null pointers in compressed trie
******* Use minimum reference size.
***** Recursion cost is O(1)
****** We started with integers of size W, now we have integers with size O(log ** (elpsilon)(n))
****** After 1/e + 1 recursions, b will have decreased to O((1/epsilon) log(n) + (w / (log ** (1+epsilon) (n))))
****** Once we have our integers this small, this is our base case, we can used packed sorting.
****** 1/e is constant
****** Therefore recursion has constant cost
**** 6) For each node, permute the edges as given by edgeIndex -> sortedOrder in step 5
**** 7) In-order traversal of compressed trie, output leaves
***** Don't need null pointers because we never search in the trie
*** Crux of the idea: If we can decrease the number of bits to a low enough amount, then we can use packed sorting.
** Packed Sorting
*** w >= 2(b+1) * log(N) loglog(n)
*** Steps
**** 0) Packet log(n)loglog(n) elements into lower half of each word
**** 1) Merge pair of sorted words with k' <= log(n)loglog(n) elements into one sorted word with 2k' elements in O(log(k')) time
**** 2) Mergesort k=log(n)loglog(n) elements into one sorted word in
***** T(k) = 2T(k/2) + O(log(k))
****** At root we pay log(k), at leaves we pay K. Roughly geometric, so the leaves dominate
****** = O(k)
**** 3) Merge two sorted lists of r sorted words with k elements into one sorted list of 2r sorted words
***** Want one sorted list of sorted words, 2 * r * k items
***** O(r * log(k)) time
***** Idea: Consider pairs of words
****** When we merge two words, log(k) time they're now sorted.
****** Can't simply output all of these sorted words and continue, don't know that sorted relative to other pairs
****** Can output the lower k items, since they're the overall k smallest.
****** Must put word with higher k items back into the lists, continue
******* Put at the front of the list containing the maximum element in that word
****** In log(k) time, output k items. So O(r * log(k)) time overall
**** 4) Mergesort with 3) as merger and 2) as the base case
***** T(n) = 2T(n/2) + O(n/k log(k))
****** n/k is r
***** T(k) = O(k) from 2
***** Each level will sum to n/k log(k)
***** We stop at level K, so there aren't log(n) levels
***** We have n/k leaves, each with O(k) cost.
***** The leaves have O(n) cost
***** The top have O(N) cost
***** We stop at level K because any less than that we have to use a different merge algorithm
***** We get O(N) sorting for packed sorting with this.
*** How to merge two sorted words into one? Bitonic sorting
**** From sorting networks
**** Bitonic sequence = cyclic shift of nondecreasing + nonincreasing sequences
***** Ex /\ or \/
***** One max, one min.
**** Bitonic sorting network, parallel
***** Compare i and n/2 + i elements, put in right order
****** Can do in parallel
***** Split in half at n/2, recurse on halves in parallel
***** O(log(n)) rounds
*** How do I make the sorted sequences bitonic?
**** Reverse the second one, suddenly bitonic
*** How do I reverse a sequence in log(k) time in a word ram?
**** Recursively split and swap sequences
***** (AB) reverse is B-reverse then A-reverse
*** How do I do constant time bitonic comparison?
**** Recall that we left a 0 between each element, replace that with 1 in the first list
**** Subtract first from second, the second still has zeroes.
**** Whether the 1 gets borrowed tells us which is larger
**** We can mask out the element values to just get the ones and zeroes
**** Shift this by the size of the items and subtract
***** This gives us 0110 or 0000, based on which is larger.
***** This gives us a bitmask. We have zeroes when the a is smaller, ones when the b is smaller.
***** Can mask and AND to get the smaller values near eachother, negate and do the same to get larges near eachother.
** And thus we have a O(1) priority queue
* Hashing
** Intro: Hashing
*** Universal & k-wise
*** Simple tabulation hashing
*** Chaining & perfect hasing
*** Linear probing
*** Cuckoo hasing
** Hash function
*** h = {0, 1, ... u-1} -> { 0, 1, 2, ... M }
**** Universe u
**** M = size of table
*** Ideal hashing = totally random hash function
**** Which slot of h you end up in is (1/M)
**** Not perfect, has chance of collisions
**** Bad: Big
***** Need to write down theta(u * log(m)) bits of information
*** Universal hashing:
**** Choose h from H such that Pr {h(x) = h(y)} = O(1/m) for all x != y element of the Universe
**** h(x) = [(ax) mod p] mod m for O < a < p
***** p is a prime larger than u, table size is m
***** Vector dot product
***** Considered expensive because of needed division
**** Faster version: h(x) = [(ax) >> (log(u) - log(m))]
***** Works when m and u are powers of 2
***** Idea: multiply and take higher order bits
*** k-wise independence
**** Implies universality
**** Harder to obtain
**** We want a family of hash functions H such that:
***** For distinct x_1 ... x_k
***** Pr {h(x_1) = t_1 &  .... & h_k(x_k) = t_k } = O(1 / (m ** k))
***** Means that every k elements are independent, distinct
**** h(x) = ([sum of i from 0 to k-1 of (a_i (x ** i))] mod p) mod m
***** Universal
***** For k=2, not pairwise independent
***** a_i's are arbitrary numbers between 0 and p
***** Old, Wegman and Carter in 1981
**** More efficient ways to do it
***** Thorup & Zhang - SODA 2004
****** O(n ** epsilon) space, f(k) query, uniform, reasonably practical
***** Siegel - SICOMP 2004
****** O(n ** epsilon) space, O(1) query for k=theta(log(n))
*** Simple Tabular Hashing
**** From 1981, but many new proofs of it
**** View x as a vector x_1, x_2... x_c of characters
**** Build a totally random hash table on each character
***** We have c of these hash tables
**** Each of size O(c * (u ** (1/c)))
**** h(x) = Xor of each table applied to each character
**** O(c) time, 3-independent
***** Almost as good as log(n) independence, but still weaker.
***** Upside: Very simple
** Chaining
*** Basic:
**** On collision, make linked list
*** Expected length of chain at slot is sum of collision probabilities for that slot
*** If our hash function is uniform, this is O(1/m), so the sum is O(n/m)
*** This is expected O(1) for m = theta(n)
*** Expected bounds not good enough, let's study * high probability bounds *
**** With high probability means Pr >= 1 - (1/(n ** c)) for any c
*** Totally random => C_t = O( (log(n)) / loglog(n)) with high probability
**** Variance analysis
***** Var[C_t] = E[C_t ** 2] - E[C_t) ** 2
****** Statistics definition of variance, E is expected value
***** For totally random is (1/m) (sum that two keys hash to same slot) = (m ** 2) * (1/m) * (1/m) = O(1)
**** Pr { C_t > c * mean } with C = log(n)/loglog(n) from above
***** Simplifies to 1 / (2 * (log(n) / loglog(n))) = 1/n
*** With a "cache" of Omega(log(n)) items last searched for
**** If we're totally random, we get O(1) amortized bound
**** Not yet in any paper, in Patrascu's blog, February 2nd, 2011
**** Theta(log(n)) keys collide with a batch of log(n) operations with high probability
***** Here we're looking at log(n) different slots, summing the number of values in each(O(1)) per).
**** Then it's O(1) per operation amortized. Charge each
** Perfect hashing
*** Called FKS hashing from authors' names (Fredman, Komlos, Szemerdi - 1984)
*** Store chain at t(C_t) as hash table of size theta(C_t ** 2)
*** Alright since we expect chains to be small
*** Space = sums of squares of collision probability = O(m) expected
*** Why C_t ** 2?
**** The birthday paradox, each has 1/c_t ** 2 prob of happening. Less then or equal to 1/1
**** Pr { # collisions = 0 } >= 1/2
*** Queries O(1) deterministic, only updates randomized
*** Dynamic
**** Two level hashing
**** Insert into chain, if get collision in small hash table rebuild
**** If chain grows by factor 2, rebuild by factor 4
**** Keep table right size for chain
**** O(1) amortized expected for each operation
**** With more effort, with high probability
** Chaining without totally random hashing
*** Univseral, k-wise indep, etc
*** Perfect hashing is fine
*** Chaining bounds hold if you have a O(log(n) / loglog(n))-wise hash function
**** Hard
*** Can trade space for simplicity, use simple table tabulation hashing
**** From Patrascu, 2011
**** Very simple to implement, with expected chain lengths
** Linear probing
*** Common wisdom false, works well in practice
*** Patrascu
*** Linear probing is 10% more time than a memory access
*** In Machine model:
**** m = (1+Epsilon) n
**** For a totally random hash function h => O(1 / (Epsilon ** 2)) expected operation
**** Set epsilon to 1, space for n.
*** For only universal hash, very very bad
*** O(log(n))-wise independence
*** Paper proved that 5-wise was enough
**** Pagh, Pagh, Ruzic - Stoc 2008
**** Some 4-wise independent, Patrascu => Omega(log(n)).
***** Shows 5-wise is tight.
***** Simple tabulation is O(1/(Epsilon ** 2))
*** Proof: Totally random implies O(1) expected with linear probing
**** Assume m = 3n
**** Make binary tree over this
***** Purely a proof overlay, only an array at runtime
**** Call a node "dangerous" if density is at least 2/3.
**** Dangerous if number of keys that hash to an interval is at least (2/3) * length of the interval
***** Pr { >= 2u } (e ** mu) / (2 ** (2 mu)) = (e / 4) ** mu
***** u = (1/3) 2 ** H for a height H node
**** Examine a run in table of length either (2 ** l) or 2 ** (l + 1)
**** Spanned by 8 - 17 nodes of height h = l - 3
**** Want to examine first 4 nodes of the run
**** Know first three are entirely filled, continues on, first is partially after starting point
**** Span 3 * (2 ** h) slots of run
**** These are the codomain of h
**** At least one node must be dangerous
***** Proof: Assume none of 4 are dangerous
***** At least 4 * (2/3) * (2 ** h) keys hash via h to them
***** So probability that all aren't dangerous is small.
**** If we have a run, we can charge it to a dangerous node!
**** Pr {len of run > len of power of two interval} <= 17 * Probability { node at height l-3 is dangerous}
**** Expected length of run = theta ( sum of (((2 ** l) *  Pr {}) / (1 / (2 ** (2 ** l)))) for all l )
**** Double exponential denominator dominates
**** Cache last theta(log ** (1 + epsilon) (n) queries) = theta(1)
**** theta(1) with high probability
*** If you use universal hash function, this will fail! Use good hash function
*** Linear probing good
**** Alternatives are good when you're using very small space, poor hash function.
*** Use simple tabulation, use a good hash function, use linear probing!
*** Cache performance amazing, that's why we prefer over all else
** Cuckoo Hashing
*** With chaining, want only two probes
**** One for large table, one for chain
*** Can take this idea of two table accesses and just hash into two tables
*** Hash into two different spots in two different tables
**** Different hash functions
*** Query A[g(x)] & B[h(x)]
**** Can do in parallel
*** Insert(x): put in A or B slot
**** If kicked out y from A slot, move to B[h(y)]
**** Key lives in one spot in A or B, in bitartite graph.
**** Kicks out the person we removed, might need to kick out someone else.
*** Might get cycle or failure situation
**** Can fail catastrophically.
**** Cycles are failure
*** (2 + Epsilon) * n space
*** 2 deterministic probes / query
*** Fully random or O(log(n)) wise indep
*** O(1) amortized expected update & O(1/n) build failure prob
*** 6-wise independence is not enough to get constant expected
*** Simple tabulation hashing => fail to build with theta(1 / (n ** (1/3)))
*** Proof of O(1) expected for perfectly random
**** This is an information-theoretic argument
**** Pr {insert follows path of length k} <= 1 / (2k)
**** Think of how to encode hash functions
***** Hash functions g and h, takes nlog(m) bits to write each down.
***** If you follow a path of length k, I can find a way to encode the functions with fewer bits
****** Saves (-k) bits
***** Happens with probability 1 / (2k)
**** How do we encode a node on the path?
**** Goal: Save one bit per node on path
***** Write down their hash values
****** k log(n) bits
***** Write down the values that corresponded to each edge
****** These are the keys of the things that got kicked out
***** We can then write down keys, not slots
**** We can run these proofs in reverse, prove it from the existence of the encoding in at least a number of bits
***** Can use the knowledge that we assume that writing down g and h takes 2nlog(m) - k + O(log(k)) bits

No comments:

Post a Comment