This work focuses on the bottleneck of file-mapping in NVM filesystems.
Intro With the development of storage device, lots of novel stuff come to world, including NAND SSD, Optane 3D XPoint, STT-RAM… Personal machines have already multi-level storage: SRAM on CPU L2/L2 cache, DRAM on CPU’s DIMMs, SSD to accelerate operating system, and HDD to storage big file like media, movie.
Paper notes about Assise, a distributed FS on NVM. Assise keeps data & metadata in clients (opposite to disaggregation) to enble fast recovery and high perf.
paper notes about XStore[1] from SJTU, which uses learned cache (aka learned index[2]) for distributed DRAM KVS.
paper notes about Persimmon[1] from UCB & MSR uses state machine scheme to utilize PM to add persistence to in-mem storage systems
主要讨论osdi 20一篇来自Twitter和CMU的工业界大规模键值存储负载特性的工作[1]。在这之前12年有一篇来自facebook的类似负载特性工作[3]。这篇博客讲讲这两篇工作。
本文简单列一些RDMA做分布式KVS的工作。这些工作在RDMA的特性基础上针对RPC、事务等方面做出了针对性优化来提升指标。
TH-DPMS: Design and Implementation of an RDMA-enabled Distributed Persistent Memory Storage System, 舒继武老师组的新工作,发在ToS 2020。 关于RDMA+PM的分布式存储系统,在一些sota方法的基础上搭建了一个分布式PM系统,以及之上的FS和KVS
Abs Clover,一个基于RDMA+NVM后端的分布式键值存储系统。通过MDS+DS(MS+DN)的分离元数据与数据的设计,进行运算和存储解耦,提升扩展性。
(9.29 update Octopus)在NVM上的单机存储系统有很多,如BPFS,NOVA[1],Ziggurat,FlatStore,SLM-DB,HiKV等等(可移步:Non-Volatile Main Memories File Systems系列 - nan01ab)。那么应该如何通过InfiniBand等网络的RDMA(Remote Direct Memory Access)环境来构建出分布式的NVM系统呢?类似Ceph和Gluster这样流行的系统也可以使用在RDMA+PM上,但效果并不是很好,后文将简单介绍一些相关的工作。