top of page

 FOREST

Sync Without Comparing Datasets
By Transmitting a Range of Missing Leaf Numbers.

FOREST

GraphSync and Negentropy both aim to synchronize data between nodes, but they diverge notably in their approach and trade-offs. In Negentropy (used by strfry & nostrDB), both parties must first exchange and compare fingerprints of their respective data ranges, a step that adds computational overhead. This key difference makes GraphSync (used by IPFS/IPLD) potentially quicker for filling in known gaps across a dataset because it only requires both parties to list what they are missing rather than list what they already have.

 

Although in GraphSync, if you're aiming to fetch specific missing leaves that are scattered under different parent hashes within a tree, then the request can become more complex. Normally, a single selector would start at one parent hash and traverse down, but when the leaves you need are under multiple parents, then you may have to specify each of those parent hashes in your request. Each additional parent hash you need to specify increases the complexity of the selector and the size of your request. The larger your request, the more bandwidth overhead and latency both parties incur.

FOREST is a node syncing method that achieves smaller request size than Negentropy and GraphSync, aiming to reduce bandwidth overhead for nodes aiming to sync nostr data. The burden of sharing fingerprints of existing data sets (Negentropy) and listing many parent hashes (GraphSync) can be avoided with FOREST. Scionic Merkle DAGs label each leaf with a number. In FOREST, nodes simply list a range of leaf numbers (e.g. 1-100) in a request and the Merkle root of the corresponding tree. This strategy dramatically shrinks the request size compared to Negentropy and GraphSync. The size of a request is practically as small as possible given that missing leaves are now represented as single integers.

bottom of page