About offloading memory data movement to DMA or DSA engines.
Prefetch to hide memory access latency (CPU stall) What to prefetch When to prefetch Where to place the prefetched data
In this article, we will list several papers on local NVM/PM fault tolerance.
QoS (LB) on persistent memory systems to avoid interference.
Problem Due to RDMA NIC implementation, RNIC doesn’t have remote persistent flush primitives. So one-sided write data from clients will write to the volatile cache on RNIC first and then RNIC directly sends ACK back before writing data to PM. As a result, a power loss will break remote data persistence easily.
LogECMem uses a hybrid method of in-place update and Parity logging (PL) for parity updates.
learned index + PM. APEX: A High-Performance Learned Index on Persistent Memory[1]
Some industry works about how to utilize DRAM+PM archi as cache (from facebook and twitter).
RDMA+KVS. Different with local hashing, insertion, deletion and update are expensive in RDMA environments. So a carefully index design based on one-sided RDMA ops is crucial.
An experiment-driven work from SJTU-IPADS shows some methods to achieve better performance in NVM+RDMA systems.