ExMessenger Exercise: Understanding Nodes in Elixir
I was working through this old 2014 blog post by Drew Kerrigan where he builds a bare-bones, command-line-based chat application, with a client that sends messages and commands to a server.
This is Elixir pre-1.0, and because it’s an exercise I refactored the original code and merged the server (ex_messenger) and client (ex_messenger_client) projects into an Elixir Umbrella project and you can find my code on Github here.
If you have multiple applications that work together and share the same dependencies you can use the Umbrella convention to have them all in the same codebase. If you mix compile from the umbrella root, it compiles all the apps (which are independent Elixir mix projects as well). It’s just a way to have related apps in the same place instead of in multiple different repositories.
The code shown here is in my personal Github repository if you want to clone it.
Nodes 101
Before we check out the exercise, there is one more concept I need to clear up. In the previous article I explained how you can start processes and exchange messages, and how you can use the OTP GenServer and Supervisor to create more robust and fault-tolerant processes.
But this is just the beginning of the story. You’ve probably heard how Erlang is great for distributed computing as well. Each Erlang VM (or BEAM) is network-enabled.
Again, this is one more concept I am still just beginning to properly learn, and you will want to read Elixir’s website documentation on Distributed tasks and configuration, that does an excellent job explaining how all this works.
But just to get started you can simply start 2 IEx sessions. From one terminal you can do:
iex --sname fabio --cookie chat
Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]
Interactive Elixir (1.1.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(fabio@Hal9000u)1> And from a different terminal you can do:
iex --sname akita --cookie chat
Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]
Interactive Elixir (1.1.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(akita@Hal9000u)1> Notice how the IEx shell shows different Node names for each instance: “fabio@Hal9000u” and “akita@Hal9000u”. It’s the sname concatenated with your machine name. From one instance you can ping the other, for example:
iex(akita@Hal9000u)2> Node.ping(:"fabio@Hal9000u")
:pongIf the name is correct and the other instance is indeed up, it responds to the ping with a :pong. This works only for nodes on the same machine, but what if I need to connect to an instance on a remote machine?
iex(akita@Hal9000u)3> Node.ping(:"fabio@192.168.1.13")
11:02:46.152 [error] ** System NOT running to use fully qualified hostnames **
** Hostname 192.168.1.13 is illegal **The –sname option sets a name only reachable within the same subnet; for a fully qualified domain name you need to use –name, for example like this:
iex --name fabio@192.168.1.13 --cookie chatAnd for the other node:
iex --name akita@192.168.1.13 --cookie chatAnd from this second terminal you can ping the other node the same way as before:
iex(akita@192.168.1.13)1> Node.ping(:"fabio@192.168.1.13")
:pongAnd you might be wondering, what is this “–cookie” thing? Just spin up a third terminal with another client name, but without the cookie, like this:
iex --name john@192.168.1.13And if you try to ping one of the first two nodes you won’t get a :pong back:
iex(john@192.168.1.13)1> Node.ping(:"fabio@192.168.1.13")
:pangThe cookie is just an atom to identify relationships between nodes. In a pool of several servers you can make sure you’re not trying to connect different applications to each other. And as a result you get a :pang. Instead of an IP address you can use a fully qualified domain name.
And just by having the node “akita@” pinging “fabio@” we can see that they are aware of each other:
iex(fabio@192.168.1.13)2> Node.list
[:"akita@192.168.1.13"]And:
iex(akita@192.168.1.13)2> Node.list
[:"fabio@192.168.1.13"]If one of the nodes crashes or quits, the Node list is automatically refreshed to reflect only nodes that are actually alive and responding.
You can check the official API Reference for the Node for more information. But this should give you a hint for the next section.
Creating a Chat Client
Back to the exercise: the ExMessenger server has “ExMessenger.Server”, which is a GenServer, and the “ExMessenger.Supervisor” that starts it up. The ExMessenger.Server is globally registered as :message_server, started and supervised by the “ExMessenger.Supervisor”.
The “ExMessengerClient” starts up the unsupervised “ExMessengerClient.MessageHandler”, which is also a GenServer, and globally registered as :message_handler.
The Tree for both apps looks roughly like this:
ExMessenger
- ExMessenger.Supervisor
+ ExMessenger.Server
ExMessengerClient
- ExMessengerClient.MessageHandlerWe start them separately, first the message server:
cd apps/ex_messenger
iex --sname server --cookie chocolate-chip -S mix runNotice that for this example we are starting with a simple name “server”, for the local subnet, and a cookie. It will respond as “server@Hal9000u” (Hal9000u being my local machine’s name).
Then, we can start the client app:
cd apps/ex_messenger_client
server=server@Hal9000u nick=john elixir --sname client -S mix runHere we are setting 2 environment variables (that we can retrieve inside the app using System.get_env/1) and also setting a local node name “client”. You can spin up more client nodes using a different “sname” and a different “nick” from another terminal, as many as you want, all linking to the same “server@Hal9000u” message server.
I’m starting up like this instead of using a command-line escript like I did in ExMangaDownloadr because I didn’t find any way to set the –sname or –name the same way I can set –cookie using Node.set_cookie. If anyone knows how to set it up differently, let me know in the comments section down below.
Notice that I said “linking” and not “connecting”. From the “ExMessengerClient” we start like this:
defmodule ExMessengerClient do
use Application
alias ExMessengerClient.CLI
alias ExMessengerClient.ServerProcotol
def start(_type, _args) do
get_env
|> connect
|> start_message_handler
|> join_chatroom
|> CLI.input_loop
end
...
endThe get_env private function is just a wrapper to handle the “server” and “nick” environment variables that we passed:
defp get_env do
server = System.get_env("server")
|> String.rstrip
|> String.to_atom
nick = System.get_env("nick")
|> String.rstrip
{server, nick}
endNow, we try to connect to the remote server:
defp connect({server, nick}) do
IO.puts "Connecting to #{server} from #{Node.self} ..."
Node.set_cookie(Node.self, :"chocolate-chip")
case Node.connect(server) do
true -> :ok
reason ->
IO.puts "Could not connect to server, reason: #{reason}"
System.halt(0)
end
{server, nick}
endThe important piece here is that we are setting the client instance’s cookie with Node.set_cookie/1 (notice that we didn’t pass it in the command-line options like we did with the server instance). Without setting the cookie, the next line with Node.connect(server) would fail to connect, as I explained in the previous section.
Then, we start the “ExMessengerClient.MessageHandler” GenServer, linking with the Message Server instance:
defp start_message_handler({server, nick}) do
ExMessengerClient.MessageHandler.start_link(server)
IO.puts "Connected"
{server, nick}
endThe Message Handler GenServer itself is very simple. It just sets the server as the state, handles incoming messages from the server and prints them out in the client’s terminal:
defmodule ExMessengerClient.MessageHandler do
use GenServer
def start_link(server) do
:gen_server.start_link({ :local, :message_handler }, __MODULE__, server, [])
end
def init(server) do
{ :ok, server }
end
def handle_cast({ :message, nick, message }, server) do
message = message |> String.rstrip
IO.puts "\n#{server}> #{nick}: #{message}"
IO.write "#{Node.self}> "
{:noreply, server}
end
endGoing back to the main “ExMessengerClient” module, after starting the (unsupervised) GenServer that receives incoming messages, we proceed to join the pseudo-chatroom in the server:
defp join_chatroom({server, nick}) do
case ServerProcotol.connect({server, nick}) do
{:ok, users} ->
IO.puts "* Joined the chatroom *"
IO.puts "* Users in the room: #{users} *"
IO.puts "* Type /help for options *"
reason ->
IO.puts "Could not join chatroom, reason: #{reason}"
System.halt(0)
end
{server, nick}
endI defined this “ServerProcotol” module, which is just a convenience wrapper for GenServer.call/3 and GenServer.cast/2 calls, to send messages to the remote GenServer called :message_server:
defmodule ExMessengerClient.ServerProcotol do
def connect({server, nick}) do
server |> call({:connect, nick})
end
def disconnect({server, nick}) do
server |> call({:disconnect, nick})
end
def list_users({server, nick}) do
server |> cast({:list_users, nick})
end
def private_message({server, nick}, to, message) do
server |> cast({:private_message, nick, to, message})
end
def say({server, nick}, message) do
server |> cast({:say, nick, message})
end
defp call(server, args) do
GenServer.call({:message_server, server}, args)
end
defp cast(server, args) do
GenServer.cast({:message_server, server}, args)
end
endPretty straightforward. Then, the main ExMessengerClient calls the recursive input_loop/1 function from the CLI module, which just receives user input and handles the proper commands using pattern matching, like this:
defmodule ExMessengerClient.CLI do
alias ExMessengerClient.ServerProcotol
def input_loop({server, nick}) do
IO.write "#{Node.self}> "
line = IO.read(:line)
|> String.rstrip
handle_command line, {server, nick}
input_loop {server, nick}
end
def handle_command("/help", _args) do
IO.puts """
Available commands:
/leave
/join
/users
/pm <to nick> <message>
or just type a message to send
"""
end
def handle_command("/leave", args) do
ServerProcotol.disconnect(args)
IO.puts "You have exited the chatroom, you can rejoin with /join or quit with /quit"
end
def handle_command("/quit", args) do
ServerProcotol.disconnect(args)
System.halt(0)
end
def handle_command("/join", args) do
ServerProcotol.connect(args)
IO.puts "Joined the chatroom"
end
def handle_command("/users", args) do
ServerProcotol.list_users(args)
end
def handle_command("", _args), do: :ok
def handle_command(nil, _args), do: :ok
def handle_command(message, args) do
if String.contains?(message, "/pm") do
{to, message} = parse_private_recipient(message)
ServerProcotol.private_message(args, to, message)
else
ServerProcotol.say(args, message)
end
end
defp parse_private_recipient(message) do
[to|message] = message
|> String.slice(4..-1)
|> String.split
message = message
|> List.foldl("", fn(x, acc) -> "#{acc} #{x}" end)
|> String.lstrip
{to, message}
end
endAnd this wraps up the Client.
Creating a Chat Server
The Chat Client sends GenServer messages to a remote {:message_server, server}, and in the example, server is just the sname “server@Hal9000u” atom.
Now, we need this :message_server and this is the “ExMessenger.Server” GenServer:
defmodule ExMessenger.Server do
use GenServer
require Logger
def start_link([]) do
:gen_server.start_link({ :local, :message_server }, __MODULE__, [], [])
end
def init([]) do
{ :ok, HashDict.new }
end
...
endAnd that’s it. When the “ExMessenger.Supervisor” starts this GenServer it registers it globally in this instance as :message_server. And this is how we address messages from what we called “clients” (the ExMessengerClient application).
When the ExMessengerClient calls the ServerProtocol.connect/1, it sends the {:connect, nick} message to the server. In the Server we handle it like this:
def handle_call({ :connect, nick }, {from, _} , users) do
cond do
nick == :server or nick == "server" ->
{:reply, :nick_not_allowed, users}
HashDict.has_key?(users, nick) ->
{:reply, :nick_in_use, users}
true ->
new_users = users |> HashDict.put(nick, node(from))
user_list = log(new_users, nick, "has joined")
{:reply, { :ok, user_list }, new_users}
end
endFirst, it checks if the nick is “server” and disallows it. Second, it checks if the nickname already exists in the internal HashDict (a key/value dictionary) and refuses if it already exists. Finally, third, it puts the pair of nickname and node name (like “client@Hal9000u”) in the HashDict and broadcasts through the log/3 private function to all other nodes in the HashDict dictionary.
The log/3 just creates a log message concatenating the nicknames of all clients and prints it out, then broadcasts it to the Message Handler of all the clients listed in the HashDict:
defp log(users, nick, message) do
user_list = users |> HashDict.keys |> Enum.join(":")
Logger.debug("#{nick} #{message}, user_list: #{user_list}")
say(nick, message)
user_list
end
def say(nick, message) do
GenServer.cast(:message_server, { :say, nick, "* #{nick} #{message} *" })
end
def handle_cast({ :say, nick, message }, users) do
ears = HashDict.delete(users, nick)
Logger.debug("#{nick} said #{message}")
broadcast(ears, nick, message)
{:noreply, users}
endUp to this point it just casts a message to itself, the {:say, nick, message} tuple, which is handled by the GenServer and calls the broadcast/3 function defined like this:
defp broadcast(users, nick, message) do
Enum.map(users, fn {_, node} ->
Task.async(fn ->
send_message_to_client(node, nick, message)
end)
end)
|> Enum.map(&Task.await/1)
end
defp send_message_to_client(client_node, nick, message) do
GenServer.cast({ :message_handler, client_node }, { :message, nick, message })
endIt maps the list of users and fires up an asynchronous Elixir Task (which is itself just a GenServer, as I explained before in the Ex Manga Downloader series). Because it’s a broadcast, it makes sense to make all of them parallel.
The important bit is the send_message_to_client/3 which casts a message to the tuple { :message_handler, client_node } where “client_node” is just “client@Hal9000u” or whatever “–sname” you used to start up each client node.
Now, this is how the clients send GenServer message calls/casts to {:message_server, server} and it sends messages back to {:message_handler, client}.
This is not your traditional TCP Client/Server example!
Now, we are calling the “ExMessenger.Server” a Chat “Server” and the “ExMessengerClient” a Chat “Client”. Although we have been calling them “Server” and “Client”, they don’t refer to the usual “TCP Server” and “TCP Client” examples you may be familiar with!
The ExMessenger.Server is indeed a Server (an OTP GenServer) but the ExMessengerClient.MessageHandler is also a Server (another OTP GenServer)! Because they both have Node behavior, it’s more like they are 2 peer-to-peer nodes instead of your old-school, simple client-server relationship. They can have client behavior (the Server sends messages to the MessageHandler) and server behavior (the Server receives messages from ExMessengerClient).
Let this concept sink in for a moment: built into the language you get a full-blown, easy-to-use, peer-to-peer network distribution model. You don’t need to have one single node elected as the sole “master”; you could have all nodes in a ring coordinating between them, avoiding single points of failure.
I believe this is maybe how Erlang-based services such as ejabberd and RabbitMQ work.
In the case of ejabberd, I can see that it keeps the state of the cluster in Mnesia tables (Mnesia being another component of OTP, a built-in distributed NoSQL database!) and it indeed uses the Node facilities to coordinate distributed nodes:
...
join(Node) ->
case {node(), net_adm:ping(Node)} of
{Node, _} ->
{error, {not_master, Node}};
{_, pong} ->
application:stop(ejabberd),
application:stop(mnesia),
mnesia:delete_schema([node()]),
application:start(mnesia),
mnesia:change_config(extra_db_nodes, [Node]),
mnesia:change_table_copy_type(schema, node(), disc_copies),
spawn(fun() ->
lists:foreach(fun(Table) ->
Type = call(Node, mnesia, table_info, [Table, storage_type]),
mnesia:add_table_copy(Table, node(), Type)
end, mnesia:system_info(tables)--[schema])
end),
application:start(ejabberd);
_ ->
{error, {no_ping, Node}}
end.This is what a snippet of pure Erlang source code looks like, by the way. You should have enough Elixir in your head right now to be able to abstract away the ugly Erlang syntax and see that it’s a case pattern matching on the {_, :pong} tuple, using Node’s ping facilities to assert the connectivity of the node and updating the Mnesia table and other setups.
Also in the source code of the RabbitMQ-Server you will find a similar thing:
become(BecomeNode) ->
error_logger:tty(false),
ok = net_kernel:stop(),
case net_adm:ping(BecomeNode) of
pong -> exit({node_running, BecomeNode});
pang -> io:format(" * Impersonating node: ~s...", [BecomeNode]),
{ok, _} = rabbit_cli:start_distribution(BecomeNode),
io:format(" done~n", []),
Dir = mnesia:system_info(directory),
io:format(" * Mnesia directory : ~s~n", [Dir])
end.Again, pinging nodes, using Mnesia for the server state. Erlang’s syntax is uncommon for most of us: variables start with a capitalized letter (we intuitively think they’re constants instead), statements end with a dot, instead of dot-notation to call a function from a module it uses a colon “:”, unlike Elixir the parentheses are not optional, and so on. Trying to read code like this shows the value of having Elixir to unleash Erlang’s hidden powers.
So, up to this point, you know how processes are spawned internally, how they are orchestrated within the OTP framework, and now how they can interact remotely through the Node peer-to-peer abstraction. And again, this is all built into the language. No other language comes even close.