Paper summary
Details
Abstract
Replication-based fault tolerance technique incurs:
- Excessive memory space
- Costly synchronization
Solution: XOR-based erasure coding and first replication then XORing
- Index:
- async replication: reduce synchronization
- versioning: recovery
- KV data: XOR: save memory space
Problems:
- Novelty is not clear.
- How is fault tolerance different in DM compared to traditional distributed systems? What are the unique challenges?
- What are the existing solutions in ensuring fault tolerance in DM? Is replication the only one?
- Is erasure coding another kind of replication with finer granularity?
- Writing is not clear
- What is the relation between consistency and fault tolerance? Are we comparing different fault-tolerance techniques under the same level of consistency guarantee?
- According to CAP theorem, are you trading availability for consistency and fault-tolerance (i.e. CP databases)?
For the background:
Misc
- Why should we care about memory efficiency? Isn't it the goal of DM?