Databases and Distributed Systems

Databases and Distributed Systems at the Science Hub are teams using innovative approaches to designing, building, or operating large-scale database services or distributed cloud services.

DSAIL: Data Systems and AI Lab

Over the past decade, AI has made substantial methodological advances in learning the complex relationships that have evolved among data, enabling applications from personal digital assistants to autonomous vehicles. An open question, however, is: How far can AI technology be pushed into other application domains? The Data Systems and AI Lab (DSAIL) explores this frontier by going beyond the use of AI for automating simple perceptual tasks to investigating opportunities to enhance and optimize large-scale data systems and enterprise applications with learned components synthesized using AI. This will include applying AI both to the construction of traditional data structures such as indexes and database methods like query optimization, schema design, and logical and physical database design; and to algorithms like system load balancing and scheduling.

This project is led by MIT faculty Sam Madden with collaborators at DSAIL. Funded by Amazon Web Services.

Pensieve: A Modular Approach for Security Verification of Microarchitectural Designs

Speculative execution attacks have become one of the most critical security threats in the computer architecture community. These attacks exploit the side effects of transient instructions, which due to mis-speculation, are speculatively executed but squashed later, and leak secrets via various micro-architectural structures. It is challenging to verify microarchitectural designs due to the informal nature of the defense development process. This project aims to design a novel modeling approach for microarchitectural designs and develop a formal-verification framework. Our modeling approach allows hardware designers to flexibly express their defense mechanisms in a modular way. The verification framework will be able to iteratively find the assumptions on the baseline architecture and formally prove the security of the design.

This work is a collaboration between MIT faculty Mengjia Yan and Adam Chlipala, and Amazon researchers Daniel Guo and Hira Syeda. Funded by Amazon.

Verifying vMVCC, a high-performance database using multi-version concurrency control

Applications routinely rely on databases not just for storing data durably on disk, but also for ensuring that transactions execute atomically despite concurrency and crashes. Achieving high performance in a database system for many concurrent transactions is challenging. Our goal is to use formal methods to develop a high-performing database system, vMVCC, with a machine- checked proof of correctness that will handle every possible scenario. Our project develops vMVCC to be a practical database system that can run the TPC-C benchmark with performance competitive to that of state-of-the-art research systems.

This work is a collaboration between MIT faculty Frans Kaashoek and Nickolai Zeldovich, and Amazon researchers Doug Terry, Emina Torlak and Xi Wang. Funded by Amazon.