Saturday, September 29, 2007

Getting started with Erlang and OTP

At work I've been helping to build a distributed system in Java for the past four years. The system has no centralized anything. We built a lot of things from scratch: a messaging layer, thread pools, node monitoring and management, leader election. And while all this was going on, we built our application. It all works, and the system is very reliable and highly available. But it took a while to get there, and we probably spent at least half our time on the distributed stuff that supports the application. I recently came across Erlang, and discovered that it solved nearly all the system-level problems we faced. If we had known about it when we started the company, it might have saved us a lot of time.

I've been going through the Armstrong book, and tried building a little multi-node program using the OTP gen_server module. I ran into a few problems, and thought I'd document the gotchas as a public service. The program is a "Hello World" server. You send a hello message to one of the server processes, and the server echoes your message and includes a count of messages processed so far.

The main problem was realizing how to use gen_server in such an environment. The various forms of gen_server:start don't appear to have any option for starting a gen_server remotely. The OTP Introduction (chapter 16) doesn't discuss this point, and the my_bank example shows everything running on the same node. Also, the my_bank example identifies the server process by name (?MODULE), so it wouldn't work for multiple banks.

Here is my test module:



main() ->
A = hello:start(a@zack),
B = hello:start(b@zack),
?DUMP(main, hello:hello(A, world)),
?DUMP(main, hello:hello(B, world)),
?DUMP(main, hello:hello(A, world)),
?DUMP(main, hello:hello(A, world)),
?DUMP(main, hello:hello(B, world)).

Here is the output:

test:9 - main: "hello : hello ( A , world )" =
test:10 - main: "hello : hello ( B , world )" =
test:11 - main: "hello : hello ( A , world )" =
test:12 - main: "hello : hello ( A , world )" =
test:13 - main: "hello : hello ( B , world )" =

I started two nodes, in two different shells, as follows:

erl -noshell -sname a
erl -noshell -sname b

(Running the shell in the background: "erl ... &" doesn't seem to work. I'm guessing that the OS process blocks when it needs to write to the console. I don't really get this part; it's kind of irritating.)

The hostname is zack, which is why main() refers to nodes a@zack and b@zack. ?DUMP is a debugging macro from definitions.hrl. hello:hello is the hello function in the hello module. I pass the PID of the server I want to send the request to. The payload is world (so the message is {hello, world}).

The entire code of hello.erl is at the end of this posting, but here is the important part:

start(Node) ->
{ok, Hello} =
rpc:call(Node, gen_server, start,
[{local, ?MODULE}, ?MODULE, [], []]),

A direct call to gen_server:start would start a gen_server locally, i.e., on the node running the test code, (this is how the my_bank example in chapter 16 is written). spawn(fun() -> gen_server:start ...) doesn't work, because then there are two processes, one started by gen_server and one from the spawn. The latter gets returned to the caller (test:main), and then main:test can't contact the gen_server. The rpc:call starts a service on Node (a@zack or b@zack, supplied by test:main), and returns the service's PID back to test:main.

I ran into one other little problem. The simple Makefile provided in chapter 6 doesn't recompile everything if an hrl file changes. So instead of supplying a rule for .erl.beam, I did this:

HEADERS = definitions.hrl

%.beam: %.erl ${HEADERS}
erlc -W $<

Here is hello.erl:



%% gen_server API

%% hello API


%% gen_server

init([]) ->
{ok, 0}.

handle_call(stop, _From, RequestCount) ->
{stop, normal, stopped, RequestCount};
handle_call({hello, Who}, _From, RequestCount) ->
NewRequestCount = RequestCount + 1,
{reply, {hello, Who, NewRequestCount}, NewRequestCount}.

handle_cast(_Request, RequestCount) ->
{noreply, RequestCount}.

handle_info(_Info, RequestCount) ->
{noreply, RequestCount}.

terminate(_Reason, _RequestCount) ->

code_change(_OldVersion, RequestCount, _Extra) ->
{ok, RequestCount}.

%% hello

start(Node) ->
{ok, Hello} = rpc:call(Node, gen_server, start,
[{local, ?MODULE}, ?MODULE, [], []]),

stop(P) ->
gen_server:call(P, stop).

hello(P, Who) ->
gen_server:call(P, {hello, Who}).

And here is definitions.hrl:

-define(DUMP(Label, X),
io:format("~p:~p - ~p: ~p = ~p~n",
[?MODULE, ?LINE, Label, ??X, X])).