ExMessenger Exercise: Understanding Nodes in Elixir

November 25, 2015 · 💬 Join the Discussion

If you're lazy, click here for the TL;DR

I was working through this old 2014 blog post by Drew Kerrigan where he builds a bare-bones, command-line-based chat application, with a client that sends messages and commands to a server.

This is Elixir pre-1.0, and because it’s an exercise I refactored the original code and merged the server (ex_messenger) and client (ex_messenger_client) projects into an Elixir Umbrella project and you can find my code on Github here.

If you have multiple applications that work together and share the same dependencies you can use the Umbrella convention to have them all in the same codebase. If you mix compile from the umbrella root, it compiles all the apps (which are independent Elixir mix projects as well). It’s just a way to have related apps in the same place instead of in multiple different repositories.

The code shown here is in my personal Github repository if you want to clone it.

Nodes 101

Before we check out the exercise, there is one more concept I need to clear up. In the previous article I explained how you can start processes and exchange messages, and how you can use the OTP GenServer and Supervisor to create more robust and fault-tolerant processes.

But this is just the beginning of the story. You’ve probably heard how Erlang is great for distributed computing as well. Each Erlang VM (or BEAM) is network-enabled.

Again, this is one more concept I am still just beginning to properly learn, and you will want to read Elixir’s website documentation on Distributed tasks and configuration, that does an excellent job explaining how all this works.

But just to get started you can simply start 2 IEx sessions. From one terminal you can do:

iex --sname fabio --cookie chat

Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Interactive Elixir (1.1.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(fabio@Hal9000u)1>

And from a different terminal you can do:

iex --sname akita --cookie chat

Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:4:4] [async-threads:10] [kernel-poll:false]

Interactive Elixir (1.1.1) - press Ctrl+C to exit (type h() ENTER for help)
iex(akita@Hal9000u)1>

Notice how the IEx shell shows different Node names for each instance: “fabio@Hal9000u” and “akita@Hal9000u”. It’s the sname concatenated with your machine name. From one instance you can ping the other, for example:

iex(akita@Hal9000u)2> Node.ping(:"fabio@Hal9000u")
:pong

If the name is correct and the other instance is indeed up, it responds to the ping with a :pong. This works only for nodes on the same machine, but what if I need to connect to an instance on a remote machine?

iex(akita@Hal9000u)3> Node.ping(:"fabio@192.168.1.13")

11:02:46.152 [error] ** System NOT running to use fully qualified hostnames **
** Hostname 192.168.1.13 is illegal **

The –sname option sets a name only reachable within the same subnet; for a fully qualified domain name you need to use –name, for example like this:

iex  --name fabio@192.168.1.13 --cookie chat

And for the other node:

iex --name akita@192.168.1.13 --cookie chat

And from this second terminal you can ping the other node the same way as before:

iex(akita@192.168.1.13)1> Node.ping(:"fabio@192.168.1.13")
:pong

And you might be wondering, what is this “–cookie” thing? Just spin up a third terminal with another client name, but without the cookie, like this:

iex --name john@192.168.1.13

And if you try to ping one of the first two nodes you won’t get a :pong back:

iex(john@192.168.1.13)1> Node.ping(:"fabio@192.168.1.13")
:pang

The cookie is just an atom to identify relationships between nodes. In a pool of several servers you can make sure you’re not trying to connect different applications to each other. And as a result you get a :pang. Instead of an IP address you can use a fully qualified domain name.

And just by having the node “akita@” pinging “fabio@” we can see that they are aware of each other:

iex(fabio@192.168.1.13)2> Node.list
[:"akita@192.168.1.13"]

And:

iex(akita@192.168.1.13)2> Node.list
[:"fabio@192.168.1.13"]

If one of the nodes crashes or quits, the Node list is automatically refreshed to reflect only nodes that are actually alive and responding.

You can check the official API Reference for the Node for more information. But this should give you a hint for the next section.

Creating a Chat Client

Back to the exercise: the ExMessenger server has “ExMessenger.Server”, which is a GenServer, and the “ExMessenger.Supervisor” that starts it up. The ExMessenger.Server is globally registered as :message_server, started and supervised by the “ExMessenger.Supervisor”.

The “ExMessengerClient” starts up the unsupervised “ExMessengerClient.MessageHandler”, which is also a GenServer, and globally registered as :message_handler.

The Tree for both apps looks roughly like this:

ExMessenger
- ExMessenger.Supervisor
    + ExMessenger.Server
ExMessengerClient
- ExMessengerClient.MessageHandler

We start them separately, first the message server:

cd apps/ex_messenger
iex --sname server --cookie chocolate-chip -S mix run

Notice that for this example we are starting with a simple name “server”, for the local subnet, and a cookie. It will respond as “server@Hal9000u” (Hal9000u being my local machine’s name).

Then, we can start the client app:

cd apps/ex_messenger_client
server=server@Hal9000u nick=john elixir --sname client -S mix run

Here we are setting 2 environment variables (that we can retrieve inside the app using System.get_env/1) and also setting a local node name “client”. You can spin up more client nodes using a different “sname” and a different “nick” from another terminal, as many as you want, all linking to the same “server@Hal9000u” message server.

I’m starting up like this instead of using a command-line escript like I did in ExMangaDownloadr because I didn’t find any way to set the –sname or –name the same way I can set –cookie using Node.set_cookie. If anyone knows how to set it up differently, let me know in the comments section down below.

Notice that I said “linking” and not “connecting”. From the “ExMessengerClient” we start like this:

defmodule ExMessengerClient do
  use Application
  alias ExMessengerClient.CLI
  alias ExMessengerClient.ServerProcotol

  def start(_type, _args) do
    get_env
      |> connect
      |> start_message_handler
      |> join_chatroom
      |> CLI.input_loop
  end
  ...
end

The get_env private function is just a wrapper to handle the “server” and “nick” environment variables that we passed:

defp get_env do
  server = System.get_env("server")
    |> String.rstrip
    |> String.to_atom
  nick = System.get_env("nick")
    |> String.rstrip
  {server, nick}
end

Now, we try to connect to the remote server:

defp connect({server, nick}) do
  IO.puts "Connecting to #{server} from #{Node.self} ..."
  Node.set_cookie(Node.self, :"chocolate-chip")
  case Node.connect(server) do
    true -> :ok
    reason ->
      IO.puts "Could not connect to server, reason: #{reason}"
      System.halt(0)
  end
  {server, nick}
end

The important piece here is that we are setting the client instance’s cookie with Node.set_cookie/1 (notice that we didn’t pass it in the command-line options like we did with the server instance). Without setting the cookie, the next line with Node.connect(server) would fail to connect, as I explained in the previous section.

Then, we start the “ExMessengerClient.MessageHandler” GenServer, linking with the Message Server instance:

defp start_message_handler({server, nick}) do
  ExMessengerClient.MessageHandler.start_link(server)
  IO.puts "Connected"
  {server, nick}
end

The Message Handler GenServer itself is very simple. It just sets the server as the state, handles incoming messages from the server and prints them out in the client’s terminal:

defmodule ExMessengerClient.MessageHandler do
  use GenServer

  def start_link(server) do
    :gen_server.start_link({ :local, :message_handler }, __MODULE__, server, [])
  end

  def init(server) do
    { :ok, server }
  end

  def handle_cast({ :message, nick, message }, server) do
    message = message |> String.rstrip
    IO.puts "\n#{server}> #{nick}: #{message}"
    IO.write "#{Node.self}> "
    {:noreply, server}
  end
end

Going back to the main “ExMessengerClient” module, after starting the (unsupervised) GenServer that receives incoming messages, we proceed to join the pseudo-chatroom in the server:

defp join_chatroom({server, nick}) do
  case ServerProcotol.connect({server, nick}) do
    {:ok, users} ->
      IO.puts "* Joined the chatroom *"
      IO.puts "* Users in the room: #{users} *"
      IO.puts "* Type /help for options *"
    reason ->
      IO.puts "Could not join chatroom, reason: #{reason}"
      System.halt(0)
  end
  {server, nick}
end

I defined this “ServerProcotol” module, which is just a convenience wrapper for GenServer.call/3 and GenServer.cast/2 calls, to send messages to the remote GenServer called :message_server:

defmodule ExMessengerClient.ServerProcotol do
  def connect({server, nick}) do
    server |> call({:connect, nick})
  end

  def disconnect({server, nick}) do
    server |> call({:disconnect, nick})
  end

  def list_users({server, nick}) do
    server |> cast({:list_users, nick})
  end

  def private_message({server, nick}, to, message) do
    server |> cast({:private_message, nick, to, message})
  end

  def say({server, nick}, message) do
    server |> cast({:say, nick, message})
  end

  defp call(server, args) do
    GenServer.call({:message_server, server}, args)
  end

  defp cast(server, args) do
    GenServer.cast({:message_server, server}, args)
  end
end

Pretty straightforward. Then, the main ExMessengerClient calls the recursive input_loop/1 function from the CLI module, which just receives user input and handles the proper commands using pattern matching, like this:

defmodule ExMessengerClient.CLI do
  alias ExMessengerClient.ServerProcotol

  def input_loop({server, nick}) do
    IO.write "#{Node.self}> "
    line = IO.read(:line)
      |> String.rstrip
    handle_command line, {server, nick}
    input_loop {server, nick}
  end

  def handle_command("/help", _args) do
    IO.puts """
    Available commands:
      /leave
      /join
      /users
      /pm <to nick> <message>
      or just type a message to send
    """
  end

  def handle_command("/leave", args) do
    ServerProcotol.disconnect(args)
    IO.puts "You have exited the chatroom, you can rejoin with /join or quit with /quit"
  end

  def handle_command("/quit", args) do
    ServerProcotol.disconnect(args)
    System.halt(0)
  end

  def handle_command("/join", args) do
    ServerProcotol.connect(args)
    IO.puts "Joined the chatroom"
  end

  def handle_command("/users", args) do
    ServerProcotol.list_users(args)
  end

  def handle_command("", _args), do: :ok

  def handle_command(nil, _args), do: :ok

  def handle_command(message, args) do
    if String.contains?(message, "/pm") do
      {to, message} = parse_private_recipient(message)
      ServerProcotol.private_message(args, to, message)
    else
      ServerProcotol.say(args, message)
    end
  end

  defp parse_private_recipient(message) do
    [to|message] = message
      |> String.slice(4..-1)
      |> String.split
    message = message
      |> List.foldl("", fn(x, acc) -> "#{acc} #{x}" end)
      |> String.lstrip
    {to, message}
  end
end

And this wraps up the Client.

Creating a Chat Server

The Chat Client sends GenServer messages to a remote {:message_server, server}, and in the example, server is just the sname “server@Hal9000u” atom.

Now, we need this :message_server and this is the “ExMessenger.Server” GenServer:

defmodule ExMessenger.Server do
  use GenServer
  require Logger

  def start_link([]) do
    :gen_server.start_link({ :local, :message_server }, __MODULE__, [], [])
  end

  def init([]) do
    { :ok, HashDict.new }
  end
  ...
end

And that’s it. When the “ExMessenger.Supervisor” starts this GenServer it registers it globally in this instance as :message_server. And this is how we address messages from what we called “clients” (the ExMessengerClient application).

When the ExMessengerClient calls the ServerProtocol.connect/1, it sends the {:connect, nick} message to the server. In the Server we handle it like this:

def handle_call({ :connect, nick }, {from, _} , users) do
  cond do
    nick == :server or nick == "server" ->
      {:reply, :nick_not_allowed, users}
    HashDict.has_key?(users, nick) ->
      {:reply, :nick_in_use, users}
    true ->
      new_users = users |> HashDict.put(nick, node(from))
      user_list = log(new_users, nick, "has joined")
      {:reply, { :ok, user_list }, new_users}
  end
end

First, it checks if the nick is “server” and disallows it. Second, it checks if the nickname already exists in the internal HashDict (a key/value dictionary) and refuses if it already exists. Finally, third, it puts the pair of nickname and node name (like “client@Hal9000u”) in the HashDict and broadcasts through the log/3 private function to all other nodes in the HashDict dictionary.

The log/3 just creates a log message concatenating the nicknames of all clients and prints it out, then broadcasts it to the Message Handler of all the clients listed in the HashDict:

defp log(users, nick, message) do
  user_list = users |> HashDict.keys |> Enum.join(":")
  Logger.debug("#{nick} #{message}, user_list: #{user_list}")
  say(nick, message)
  user_list
end

def say(nick, message) do
  GenServer.cast(:message_server, { :say, nick, "* #{nick} #{message} *" })
end

def handle_cast({ :say, nick, message }, users) do
  ears = HashDict.delete(users, nick)
  Logger.debug("#{nick} said #{message}")
  broadcast(ears, nick, message)
  {:noreply, users}
end

Up to this point it just casts a message to itself, the {:say, nick, message} tuple, which is handled by the GenServer and calls the broadcast/3 function defined like this:

defp broadcast(users, nick, message) do
  Enum.map(users, fn {_, node} ->
    Task.async(fn ->
      send_message_to_client(node, nick, message)
    end)
  end)
  |> Enum.map(&Task.await/1)
end

defp send_message_to_client(client_node, nick, message) do
  GenServer.cast({ :message_handler, client_node }, { :message, nick, message })
end

It maps the list of users and fires up an asynchronous Elixir Task (which is itself just a GenServer, as I explained before in the Ex Manga Downloader series). Because it’s a broadcast, it makes sense to make all of them parallel.

The important bit is the send_message_to_client/3 which casts a message to the tuple { :message_handler, client_node } where “client_node” is just “client@Hal9000u” or whatever “–sname” you used to start up each client node.

Now, this is how the clients send GenServer message calls/casts to {:message_server, server} and it sends messages back to {:message_handler, client}.

This is not your traditional TCP Client/Server example!

Now, we are calling the “ExMessenger.Server” a Chat “Server” and the “ExMessengerClient” a Chat “Client”. Although we have been calling them “Server” and “Client”, they don’t refer to the usual “TCP Server” and “TCP Client” examples you may be familiar with!

The ExMessenger.Server is indeed a Server (an OTP GenServer) but the ExMessengerClient.MessageHandler is also a Server (another OTP GenServer)! Because they both have Node behavior, it’s more like they are 2 peer-to-peer nodes instead of your old-school, simple client-server relationship. They can have client behavior (the Server sends messages to the MessageHandler) and server behavior (the Server receives messages from ExMessengerClient).

Let this concept sink in for a moment: built into the language you get a full-blown, easy-to-use, peer-to-peer network distribution model. You don’t need to have one single node elected as the sole “master”; you could have all nodes in a ring coordinating between them, avoiding single points of failure.

I believe this is maybe how Erlang-based services such as ejabberd and RabbitMQ work.

In the case of ejabberd, I can see that it keeps the state of the cluster in Mnesia tables (Mnesia being another component of OTP, a built-in distributed NoSQL database!) and it indeed uses the Node facilities to coordinate distributed nodes:

...
join(Node) ->
    case {node(), net_adm:ping(Node)} of
        {Node, _} ->
            {error, {not_master, Node}};
        {_, pong} ->
            application:stop(ejabberd),
            application:stop(mnesia),
            mnesia:delete_schema([node()]),
            application:start(mnesia),
            mnesia:change_config(extra_db_nodes, [Node]),
            mnesia:change_table_copy_type(schema, node(), disc_copies),
            spawn(fun()  ->
                lists:foreach(fun(Table) ->
                            Type = call(Node, mnesia, table_info, [Table, storage_type]),
                            mnesia:add_table_copy(Table, node(), Type)
                    end, mnesia:system_info(tables)--[schema])
                end),
            application:start(ejabberd);
        _ ->
            {error, {no_ping, Node}}
    end.

This is what a snippet of pure Erlang source code looks like, by the way. You should have enough Elixir in your head right now to be able to abstract away the ugly Erlang syntax and see that it’s a case pattern matching on the {_, :pong} tuple, using Node’s ping facilities to assert the connectivity of the node and updating the Mnesia table and other setups.

Also in the source code of the RabbitMQ-Server you will find a similar thing:

become(BecomeNode) ->
    error_logger:tty(false),
    ok = net_kernel:stop(),
    case net_adm:ping(BecomeNode) of
        pong -> exit({node_running, BecomeNode});
        pang -> io:format("  * Impersonating node: ~s...", [BecomeNode]),
                {ok, _} = rabbit_cli:start_distribution(BecomeNode),
                io:format(" done~n", []),
                Dir = mnesia:system_info(directory),
                io:format("  * Mnesia directory  : ~s~n", [Dir])
    end.

Again, pinging nodes, using Mnesia for the server state. Erlang’s syntax is uncommon for most of us: variables start with a capitalized letter (we intuitively think they’re constants instead), statements end with a dot, instead of dot-notation to call a function from a module it uses a colon “:”, unlike Elixir the parentheses are not optional, and so on. Trying to read code like this shows the value of having Elixir to unleash Erlang’s hidden powers.

So, up to this point, you know how processes are spawned internally, how they are orchestrated within the OTP framework, and now how they can interact remotely through the Node peer-to-peer abstraction. And again, this is all built into the language. No other language comes even close.