A scalable data structure is a data structure distributed across multiple nodes. You can add nodes to get more capacity, more throughput, and better availability. Erlang/OTP is ideal for building such structures, but is missing one critical piece.
I've been working on a distributed hash table (DHT) based on the idea of linear hashing. (The inventor of linear hashing, Witold Litwin, is also working along these lines.) Each node can serve a put or get request, and also owns a set of hash buckets. A mapping of hash buckets to nodes is used to dispatch a request to the right node.
When a node fails, or when a node is added, a new mapping of hash buckets to nodes is required. The question is how to create this new mapping. It is difficult or maybe impossible to guarantee that each node will derive the identical map, especially since map creation isn't perfectly synchronous across the set of nodes, and there may be changes in the node population (e.g. another node failure) during map creation.
The alternative is for one node to create the map and distribute it to other nodes. But which node? What is needed is for the set of participating nodes to agree on a node to carry out this task. This is the "leader election" problem. When a leader is needed, it is needed right away. There must be exactly one leader selected, and all participants must know who the leader is.
Leader election is a suprisingly difficult problem. Erlang/OTP does not include a leader election module. However, I ran across an implementation of leader election that would be right at home in OTP:
gen_leader.
gen_leader includes a demo application, gdict. Updates go to the leader and are distributed to all nodes, while reads can be carried out locally. I had some trouble getting gdict to work. One of the gen_leader authors, Ulf Wiger, was kind enough to set me straight. My test program appears below. Some comments on the code:
- I first wrote the test to run from one node, z@zack, and access a two-node hash table, on nodes a@zack and b@zack. I also specified that nodes a and b were both candidate leaders (second argument to gdict:new) and workers (not eligible to be leader). All that was wrong.
- The current version of my test runs on node z, makes a, b, and z candidates, has no workers, and runs gdict:new on z.
- It is necessary to have the nodes see each other, which I accomplished using the hack of the two rpc:call invocations. I still don't completely understand how the nodes are supposed to discover each other, but I haven't gotten very far in reading about OTP.
I haven't done any work with gen_leader other than to watch gdict work. But assuming it works as advertised, (and the authors seem to have been fanatical about testing), it is a crucial piece of software in implementing a scalable data structure in Erlang.
The gdict test code:
-module(test).
-export([main/0]).
-define(DUMP(X), io:format("~p:~p - ~p = ~p~n", [?MODULE, ?LINE, ??X, X])).
main() ->
rpc:call(a@zack, hello, hello, []),
rpc:call(b@zack, hello, hello, []),
?DUMP(node()),
?DUMP(nodes()),
Nodes = [node() | nodes()],
{ok, D} = gdict:new(test, Nodes, []),
?DUMP(D),
?DUMP(gdict:append(a, 1, D)),
?DUMP(gdict:append(b, 2, D)),
?DUMP(gdict:append(c, 3, D)),
?DUMP(gdict:find(a, D)),
?DUMP(gdict:find(b, D)),
?DUMP(gdict:find(c, D)).