Relaxed concurrent data structures

A central computing trend over the last decade has been the need to process increasingly larger amounts of data as efficiently as possible. This development is challenging both software and hardware design, and is altering the way data structures and algorithms are constructed, implemented, and deployed.

In this course, I will present some examples of such new data structure design ideas and implementations. In particular, I will discuss some inherent limitations of parallelizing classic data structures, and then focus on approaches to circumvent these limitations. The first approach is to relax the software semantics, to allow for approximation, randomization, or both. The second is to modify the underlying hardware architecture to unlock more parallelism. Time permitting, I will also cover results showing that both approaches can improve real-world performance, and touch upon some of the major open questions in the area.

Lock-Free Algorithms for Kotlin Coroutines

Kotlin is a modern and pragmatic language, developed by JetBrains, that compiles to Java Virtual Machine (JVM). Kotlin 1.1 introduced coroutines for writing scalable asynchronous code. Basically, Kotlin coroutines are very light-weight threads, and millions of concurrent coroutines can be easily running, each coroutine occupying just a few dozen bytes of the heap when suspended. Because of the inherent scalability requirement, many concurrent data structures used in the coroutines, such as communication channels between coroutines, are expected to be lock-free and disjoint-access parallel, yet they have to be practical and readily implementable in the confines of JVM. This creates unique challenges for the design of the corresponding support libraries.
In this lecture we shall walk through the basics of lock-free doubly-linked lists by Sundell & Tsigas and multi-word compare-and-swap (CASN) by Harris - two practical lock-free constructions that serve as a backbone for many lock-free algorithms in Kotlin coroutines support libraries. We'll discuss challenges of designing linearisable implementations for complex data structures and how to extend CASN to support limited, but practically needed combinations of several operations on different data structures in the form of atomic transactions, without going into full-blow implementation of general-purpose software transactional memory.
Roman Elizarov
ITMO University, JetBrains
Rachid Guerraoui

Wait-free computing "for dummies"

With the advent of multiprocessors, it becomes crucial to master the underlying algorithmics of concurrency. The objective of this lecture is to study the foundations of concurrent algorithms and in particular the techniques that enable the construction of the most robust such algorithms, those called wait-free. The lecture will cover registers, counters and snapshots objects, as well as the fundamental results on the FLP impossibility and the universality of consensus.

Lock-free concurrent data structures

Lock-free algorithms are shared-memory multiprocessor algorithms that can avoid hazardous race conditions and guarantee correctness without using mutual exclusion locks. They are in general more resilient to asynchronous conditions than lock-based algorithms, since they guarantee progress even if some of the threads participating in the algorithm are delayed for a long duration or even fail-stop. On the other hand, lock-free algorithms are less resilient to asynchrony than wait-free ones, since they only guarantee global progress in spite of thread failures, whereas the latter guarantee that any non-failed thread can make progress. On the up side, lock-free algorithms are often easier to devise and more efficient than their wait-free counterparts.

In this mini-course, we will study well-known lock-free algorithms for several concurrent data-structures. In addition to being interesting in their own right, these will serve to convey algorithmic techniques, such as elimination, that can be used for devising high-throughput lock-free implementations. If time permits, we will also discuss the notion of helping and describe open problems that relate to it.
Maurice Herlihy
Brown University

Transactional Memory

Computer architecture is undergoing a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping "multicore" architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed.
As a result, system designers and software engineers can no longer rely on increasing clock speed to hide software bloat. Instead, they must somehow learn to make effective use of increasing parallelism. This adaptation will not be easy. Conventional synchronization techniques based on locks and conditions are unlikely to be effective in such a demanding environment. Coarsegrained locks, which protect relatively large amounts of data, do not scale, and fine-grained locks introduce substantial software engineering problems.
Transactional memory is a computational model in which threads synchronize by optimistic, lock-free transactions. This synchronization model promises to alleviate many (perhaps not all) of the problems associated with locking, and there is a growing community of researchers working on both software and hardware support for this approach. This course will give a survey of the area, with a discussion of existing algorithms and open research questions.

Recommenders and distributed machine learning: algorithms and systems

Collaborative filtering is one of the most popular approaches to build recommenders that provide users with content that matches their interest. Interestingly, the very notion of users can be general and span actual humans or software applications. In this seminar, we will introduce the concept of recommendation and survey various algorithmic techniques to implement it. Beyond the algorithms, the actual implementation (centtralized, hybrid, distributed) is of the utmost importance for it is key to scalability and efficiency. We will also describe various systems that implement collaborative filtering-based recommenders, their properties and performance.
Erez Petrank

Memory management for concurrent data structures

In part 1 of this class, we will start with a crash course on garbage collection, including the basic classical algorithms. We will then focus our attention on concurrent garbage collection, which is intended to provide the running programs with better progress guarantees. Finally, we will explain why lock-freedom is foiled when using any memory manager known to us today. Lock-free garbage collection is an open problem.

In part 2 of the class, we will discuss specific memory managers that are meant to provide support for lock-free data structures without foiling their lock-free guarantees. We will start with the simple algorithms of the 90's and then survey recently published algorithms. We will go over properties that these algorithms must satisfy, explain the difficulties, and explain what the state-of-the-art memory managers provide.

Distributed Constructions: a guided tour

The notion of a universal construction is central in computing science: the wheel has not to be reinvented for each new problem. In the context of n-process asynchronous distributed systems, a universal construction is an algorithm that is able to build any object defined by a sequential specification despite the occurrence of up to n-1 process crash failures. The aim of this tutorial is to present a guided tour of such universal constructions. Its spirit is not to be a catalog of the numerous constructions proposed so far, but a (as simple as possible) presentation of the basic concepts and mechanisms that constitute the basis these constructions rest on.
Liuba Shrira
Brandeis University

Implementation techniques for libraries of transactional concurrent data types

A concurrent system's performance can often be improved if we understand the semantics of its data types. Type-specific concurrency control is particularly helpful in software transactional memory (STM) systems. This course will describe how to exploit type-specific knowledge to avoid false conflicts under both optimistic and pessimistic synchronization schemes. We will analyze the costs of tracking conflicts among abstract operations, and describe the kinds of mechanisms needed to do so.

Locking, from traditional to modern

In this mini-course we will cover a progression of locking techniques, starting with simple traditional Test-and-Test-and-Set locks through a variety of increasingly more sophisticated locking techniques. We will survey the various caching and scalability properties expected from a modern locking scheme and how new classes of locks such as time-out locks and cohort locks meet these requirements. We will culminate with client-server style locks such as Owiki and flat-combining.