Faster Update Time for Turnstile Streaming Algorithms

Faster Update Time for Turnstile Streaming Algorithms –

A Google TechTalk, presented by Josh Alman, 2020/01/09

ABSTRACT: In this talk, I’ll present a new algorithm for maintaining linear sketches in turnstile streams with faster update time. In the turnstile streaming model, the data structure wants to maintain a vector v of n integers, under updates which increment or decrement entries of the vector. Algorithms in this model try to quickly maintain a sketch of v, of size much smaller than n, which can still be used to recover interesting properties of v. Many of the most successful algorithms work by maintaining log n different linear sketches, where each sketch partitions the coordinates into k[less than]log^{o(1)} n buckets using a c-wise independent hash function for constant c, and maintains the sum of coordinates for each bucket. Our new approach improves the update time for maintaining any sketch of this form.

As an application, we show that log n Count sketches or CountMin sketches with a constant number of columns (i.e., buckets) can be implicitly maintained in worst-case O(log^{0.582} n) update time using O(log n) words of space, on a standard word RAM with word-size w=Theta(log n). The exponent 0.582 = 2 * omega/3-1, where omega is the current matrix multiplication exponent. Due to the numerous applications of linear sketches, our algorithm improves the update time for many streaming problems in turnstile streams, in the high success probability setting, without using more space, including l_2 norm estimation, l_2 heavy hitters, point query with l_1 or l_2 error, etc. Our update algorithm is adaptive, and it circumvents the non-adaptive cell-probe lower bounds for turnstile streaming algorithms by Larsen, Nelson and Nguyễn (STOC’15).

Based on joint work with Huacheng Yu to appear in SODA 2020.

About the speaker: Josh Alman is a postdoc at Harvard

Seen on Youtube!