* Integer Data Structures IV: Integer Sorting and Priority Queues ** Intro *** Priority queues equivalent to sorting **** O(f(n, w)) priority queue gives us the sorting algorithm trivially, O(n * f(n, w)) sorting with n removals **** The inverse is also true, a sorting algorithm in this time gives us a priority queue in this time *** Problem: Can we get linear-time sorting here? *** Open problem: Can we get O(1) decrease-key? ** Integer Sorting bounds *** We have n w-bit integers *** Comparison sorting: O(n log n) *** Counting sort O(n+u) = O(n) if w = log(n) *** Radix sort: O(n * (w / log(n))) **** Counting sort for each digit in the radix **** Number of chunks is w / log(N) **** Linear is w = O(log(n)) *** Priority queue is weaker than predecessor, so we can use vEB **** O(n log(w)) time **** Can achieve O(n * log(w / log(n))) with more care *** Signature sort **** O(n) if w is Omega(log ** (2 + epsilon) (n)) forall epsilon > 0 **** Word size must be large compared to n **** Could use vEB to get O(nloglog(n)) for all w **** Han got O(nloglog(n)) with signature sort in O(n loglog(n)) forall w, deterministically in AC ***** Got rid of randomization used in vEB **** Han & Thorup get O(n * sqrt (log(w/log(n)))) expected ***** Randomized, not deterministics ***** Is O(n sqrt(loglog(n))) forall w, since we can use above for large w. **** Open: Working with small w. Can we get O(N) forall w? *** Packed sorting *** Bitonic Sort, uses a sorting network ** Signature sort *** Assume w >= log ** (2 + epsilon) (n) * loglog(n) *** Steps **** 1) Break each integer into ( log ** (epsilon) (n) ) equal-sized chunks. ***** Each chunk is at least log ** 2 (n) because of division from above w limit **** 2) Replace each chunk by O(log(n))-bit hash ***** Num is in chunks of digits. Think of digits. ***** We have huge range for each digit. ****** 2 ** (log ** 2(n))) possible values ***** Insight: Most digits don't appear ****** We have not that many chunks compared to the value range of each chunk, it's the number of chunks that limits our real range of values in the problem domain. ***** Let's hash them! This makes the universe of values smaller ****** Reduces to a polynomial universe ***** Note: ****** Want linear time, so we have to compute the hash value of each in constant time ******* Use multiplication method. Multiply by single value m, take modulus hash width. ******* Use m large enough to avoid collisions ****** We do not preserve order. We've mucked that up with hashing. ******* Each digit is mangled, out of order. The sequence of digits is preserved though. **** 3) Use packed sorting in O(n) time. ***** n b-bit integers with w = Omega(b log(n) * loglog(n)) ****** Provided b is small ***** Applies here because each chunk is log(n)-bits. ***** b = theta (log(1+epislon) (n)) and w >= (log ** (2+e) (n)) * loglog(n) ***** Wait! This doesn't give us the actual values sorted, only the hashes **** 4) Build a compressed trie of sorted signatures ***** Aside: tries introduced later ****** trie is a tree where children of tree come from fixed universe ****** Comes from re-"trie"-ve ****** Will be like a B-tree since hashes large, we have huge branching factor ***** Why? ****** We've messed up the digits via packed sorting ****** This is equivalent to permuting the children in the interior nodes of the trie representation ****** What if we had a compressed trie representation of origional values? ******* We could do in-order traversal to find sorted order of leaves, could find sorted order of integers ******* But we don't have that trie, we have the trie where children permuted in interior nodes ******* If we put children in right order, we'd have the correct compressed trie ***** Compressed trie ****** Contract every non-branching node into parent ****** O(n) size in number of leaves ****** Same order as trie, order of leaves is preserved ***** Can compute compressed trie in O(n) time ****** Amortization argument ****** Given leaves in sorted order, want trie above the leaves ****** Compressed trie for speed. ****** Steps: For x_1, x_2, .... x_n in order ******* Insert into compressed trie starting with empty trie ******* At time i, I've computed the left part of the trie. Need to add x_i ******* x_i must come off of rightmost node of tree at some height ******* If node not contracted, add x_i on right in front of predecessor ******* If node contracted, a node suddenly is branching but wasn't. We can make a new node and add x_i to it. ******** This is constant amortized time ********* Start at rightmost leaf and walk up tree to find right point, O(n) time ********* O(N) since we can use XOR to find most significant bit between x_i and x_(i-1) to find where differ ********* if K edges, can do in O(K+1) ********* Alright because amortization measure function is length of rightmost path. ********* We go up, so k gets smaller. We charge to length of rightmost path. ********* O(n) amortized, since basically in-order traversal ***** Trie is correct after, but order of children **** 5) Recursively sort all edges by (node.ID, actualChunk, edgeIndex) ***** Actual chunk = first chunk, actual chunk not hash ***** Index of edge ***** For each node, give me the sorted order of the actual chunks followed by the order they used to be in. ****** Gives you inverse of permutation ***** Size ****** Node.id is O(log(n)) ****** Chunk is O(w / (log ** (epsilon) (n))) ****** Edge index is log(n) bits ******* Don't store null pointers in compressed trie ******* Use minimum reference size. ***** Recursion cost is O(1) ****** We started with integers of size W, now we have integers with size O(log ** (elpsilon)(n)) ****** After 1/e + 1 recursions, b will have decreased to O((1/epsilon) log(n) + (w / (log ** (1+epsilon) (n)))) ****** Once we have our integers this small, this is our base case, we can used packed sorting. ****** 1/e is constant ****** Therefore recursion has constant cost **** 6) For each node, permute the edges as given by edgeIndex -> sortedOrder in step 5 **** 7) In-order traversal of compressed trie, output leaves ***** Don't need null pointers because we never search in the trie *** Crux of the idea: If we can decrease the number of bits to a low enough amount, then we can use packed sorting. ** Packed Sorting *** w >= 2(b+1) * log(N) loglog(n) *** Steps **** 0) Packet log(n)loglog(n) elements into lower half of each word **** 1) Merge pair of sorted words with k' <= log(n)loglog(n) elements into one sorted word with 2k' elements in O(log(k')) time **** 2) Mergesort k=log(n)loglog(n) elements into one sorted word in ***** T(k) = 2T(k/2) + O(log(k)) ****** At root we pay log(k), at leaves we pay K. Roughly geometric, so the leaves dominate ****** = O(k) **** 3) Merge two sorted lists of r sorted words with k elements into one sorted list of 2r sorted words ***** Want one sorted list of sorted words, 2 * r * k items ***** O(r * log(k)) time ***** Idea: Consider pairs of words ****** When we merge two words, log(k) time they're now sorted. ****** Can't simply output all of these sorted words and continue, don't know that sorted relative to other pairs ****** Can output the lower k items, since they're the overall k smallest. ****** Must put word with higher k items back into the lists, continue ******* Put at the front of the list containing the maximum element in that word ****** In log(k) time, output k items. So O(r * log(k)) time overall **** 4) Mergesort with 3) as merger and 2) as the base case ***** T(n) = 2T(n/2) + O(n/k log(k)) ****** n/k is r ***** T(k) = O(k) from 2 ***** Each level will sum to n/k log(k) ***** We stop at level K, so there aren't log(n) levels ***** We have n/k leaves, each with O(k) cost. ***** The leaves have O(n) cost ***** The top have O(N) cost ***** We stop at level K because any less than that we have to use a different merge algorithm ***** We get O(N) sorting for packed sorting with this. *** How to merge two sorted words into one? Bitonic sorting **** From sorting networks **** Bitonic sequence = cyclic shift of nondecreasing + nonincreasing sequences ***** Ex /\ or \/ ***** One max, one min. **** Bitonic sorting network, parallel ***** Compare i and n/2 + i elements, put in right order ****** Can do in parallel ***** Split in half at n/2, recurse on halves in parallel ***** O(log(n)) rounds *** How do I make the sorted sequences bitonic? **** Reverse the second one, suddenly bitonic *** How do I reverse a sequence in log(k) time in a word ram? **** Recursively split and swap sequences ***** (AB) reverse is B-reverse then A-reverse *** How do I do constant time bitonic comparison? **** Recall that we left a 0 between each element, replace that with 1 in the first list **** Subtract first from second, the second still has zeroes. **** Whether the 1 gets borrowed tells us which is larger **** We can mask out the element values to just get the ones and zeroes **** Shift this by the size of the items and subtract ***** This gives us 0110 or 0000, based on which is larger. ***** This gives us a bitmask. We have zeroes when the a is smaller, ones when the b is smaller. ***** Can mask and AND to get the smaller values near eachother, negate and do the same to get larges near eachother. ** And thus we have a O(1) priority queue * Hashing ** Intro: Hashing *** Universal & k-wise *** Simple tabulation hashing *** Chaining & perfect hasing *** Linear probing *** Cuckoo hasing ** Hash function *** h = {0, 1, ... u-1} -> { 0, 1, 2, ... M } **** Universe u **** M = size of table *** Ideal hashing = totally random hash function **** Which slot of h you end up in is (1/M) **** Not perfect, has chance of collisions **** Bad: Big ***** Need to write down theta(u * log(m)) bits of information *** Universal hashing: **** Choose h from H such that Pr {h(x) = h(y)} = O(1/m) for all x != y element of the Universe **** h(x) = [(ax) mod p] mod m for O < a < p ***** p is a prime larger than u, table size is m ***** Vector dot product ***** Considered expensive because of needed division **** Faster version: h(x) = [(ax) >> (log(u) - log(m))] ***** Works when m and u are powers of 2 ***** Idea: multiply and take higher order bits *** k-wise independence **** Implies universality **** Harder to obtain **** We want a family of hash functions H such that: ***** For distinct x_1 ... x_k ***** Pr {h(x_1) = t_1 & .... & h_k(x_k) = t_k } = O(1 / (m ** k)) ***** Means that every k elements are independent, distinct **** h(x) = ([sum of i from 0 to k-1 of (a_i (x ** i))] mod p) mod m ***** Universal ***** For k=2, not pairwise independent ***** a_i's are arbitrary numbers between 0 and p ***** Old, Wegman and Carter in 1981 **** More efficient ways to do it ***** Thorup & Zhang - SODA 2004 ****** O(n ** epsilon) space, f(k) query, uniform, reasonably practical ***** Siegel - SICOMP 2004 ****** O(n ** epsilon) space, O(1) query for k=theta(log(n)) *** Simple Tabular Hashing **** From 1981, but many new proofs of it **** View x as a vector x_1, x_2... x_c of characters **** Build a totally random hash table on each character ***** We have c of these hash tables **** Each of size O(c * (u ** (1/c))) **** h(x) = Xor of each table applied to each character **** O(c) time, 3-independent ***** Almost as good as log(n) independence, but still weaker. ***** Upside: Very simple ** Chaining *** Basic: **** On collision, make linked list *** Expected length of chain at slot is sum of collision probabilities for that slot *** If our hash function is uniform, this is O(1/m), so the sum is O(n/m) *** This is expected O(1) for m = theta(n) *** Expected bounds not good enough, let's study * high probability bounds * **** With high probability means Pr >= 1 - (1/(n ** c)) for any c *** Totally random => C_t = O( (log(n)) / loglog(n)) with high probability **** Variance analysis ***** Var[C_t] = E[C_t ** 2] - E[C_t) ** 2 ****** Statistics definition of variance, E is expected value ***** For totally random is (1/m) (sum that two keys hash to same slot) = (m ** 2) * (1/m) * (1/m) = O(1) **** Pr { C_t > c * mean } with C = log(n)/loglog(n) from above ***** Simplifies to 1 / (2 * (log(n) / loglog(n))) = 1/n *** With a "cache" of Omega(log(n)) items last searched for **** If we're totally random, we get O(1) amortized bound **** Not yet in any paper, in Patrascu's blog, February 2nd, 2011 **** Theta(log(n)) keys collide with a batch of log(n) operations with high probability ***** Here we're looking at log(n) different slots, summing the number of values in each(O(1)) per). **** Then it's O(1) per operation amortized. Charge each ** Perfect hashing *** Called FKS hashing from authors' names (Fredman, Komlos, Szemerdi - 1984) *** Store chain at t(C_t) as hash table of size theta(C_t ** 2) *** Alright since we expect chains to be small *** Space = sums of squares of collision probability = O(m) expected *** Why C_t ** 2? **** The birthday paradox, each has 1/c_t ** 2 prob of happening. Less then or equal to 1/1 **** Pr { # collisions = 0 } >= 1/2 *** Queries O(1) deterministic, only updates randomized *** Dynamic **** Two level hashing **** Insert into chain, if get collision in small hash table rebuild **** If chain grows by factor 2, rebuild by factor 4 **** Keep table right size for chain **** O(1) amortized expected for each operation **** With more effort, with high probability ** Chaining without totally random hashing *** Univseral, k-wise indep, etc *** Perfect hashing is fine *** Chaining bounds hold if you have a O(log(n) / loglog(n))-wise hash function **** Hard *** Can trade space for simplicity, use simple table tabulation hashing **** From Patrascu, 2011 **** Very simple to implement, with expected chain lengths ** Linear probing *** Common wisdom false, works well in practice *** Patrascu *** Linear probing is 10% more time than a memory access *** In Machine model: **** m = (1+Epsilon) n **** For a totally random hash function h => O(1 / (Epsilon ** 2)) expected operation **** Set epsilon to 1, space for n. *** For only universal hash, very very bad *** O(log(n))-wise independence *** Paper proved that 5-wise was enough **** Pagh, Pagh, Ruzic - Stoc 2008 **** Some 4-wise independent, Patrascu => Omega(log(n)). ***** Shows 5-wise is tight. ***** Simple tabulation is O(1/(Epsilon ** 2)) *** Proof: Totally random implies O(1) expected with linear probing **** Assume m = 3n **** Make binary tree over this ***** Purely a proof overlay, only an array at runtime **** Call a node "dangerous" if density is at least 2/3. **** Dangerous if number of keys that hash to an interval is at least (2/3) * length of the interval ***** Pr { >= 2u } (e ** mu) / (2 ** (2 mu)) = (e / 4) ** mu ***** u = (1/3) 2 ** H for a height H node **** Examine a run in table of length either (2 ** l) or 2 ** (l + 1) **** Spanned by 8 - 17 nodes of height h = l - 3 **** Want to examine first 4 nodes of the run **** Know first three are entirely filled, continues on, first is partially after starting point **** Span 3 * (2 ** h) slots of run **** These are the codomain of h **** At least one node must be dangerous ***** Proof: Assume none of 4 are dangerous ***** At least 4 * (2/3) * (2 ** h) keys hash via h to them ***** So probability that all aren't dangerous is small. **** If we have a run, we can charge it to a dangerous node! **** Pr {len of run > len of power of two interval} <= 17 * Probability { node at height l-3 is dangerous} **** Expected length of run = theta ( sum of (((2 ** l) * Pr {}) / (1 / (2 ** (2 ** l)))) for all l ) **** Double exponential denominator dominates **** Cache last theta(log ** (1 + epsilon) (n) queries) = theta(1) **** theta(1) with high probability *** If you use universal hash function, this will fail! Use good hash function *** Linear probing good **** Alternatives are good when you're using very small space, poor hash function. *** Use simple tabulation, use a good hash function, use linear probing! *** Cache performance amazing, that's why we prefer over all else ** Cuckoo Hashing *** With chaining, want only two probes **** One for large table, one for chain *** Can take this idea of two table accesses and just hash into two tables *** Hash into two different spots in two different tables **** Different hash functions *** Query A[g(x)] & B[h(x)] **** Can do in parallel *** Insert(x): put in A or B slot **** If kicked out y from A slot, move to B[h(y)] **** Key lives in one spot in A or B, in bitartite graph. **** Kicks out the person we removed, might need to kick out someone else. *** Might get cycle or failure situation **** Can fail catastrophically. **** Cycles are failure *** (2 + Epsilon) * n space *** 2 deterministic probes / query *** Fully random or O(log(n)) wise indep *** O(1) amortized expected update & O(1/n) build failure prob *** 6-wise independence is not enough to get constant expected *** Simple tabulation hashing => fail to build with theta(1 / (n ** (1/3))) *** Proof of O(1) expected for perfectly random **** This is an information-theoretic argument **** Pr {insert follows path of length k} <= 1 / (2k) **** Think of how to encode hash functions ***** Hash functions g and h, takes nlog(m) bits to write each down. ***** If you follow a path of length k, I can find a way to encode the functions with fewer bits ****** Saves (-k) bits ***** Happens with probability 1 / (2k) **** How do we encode a node on the path? **** Goal: Save one bit per node on path ***** Write down their hash values ****** k log(n) bits ***** Write down the values that corresponded to each edge ****** These are the keys of the things that got kicked out ***** We can then write down keys, not slots **** We can run these proofs in reverse, prove it from the existence of the encoding in at least a number of bits ***** Can use the knowledge that we assume that writing down g and h takes 2nlog(m) - k + O(log(k)) bits

## Tuesday, September 29, 2015

### Integer Data Structures + Hashing Notes

The last week I covered the fourth integer data structure lecture and the hashing lecture. The hashing lecture mostly pertained to hash tables, with a focus on collision management and hash suitability.

Subscribe to:
Post Comments (Atom)

## No comments:

## Post a Comment