Setting up high-availability failover mode

How the built-in high availability mode works

Access Server comes with a built-in failover mode which optionally can be installed and configured. The idea is having two OpenVPN Access Server installations running on the same local private network. Each has its own unique separate IP address, but they also are able to use a virtual IP which is 'shared' with the other node. The primary node will be online and handling all the connections and all the traffic coming in on the shared virtual IP, but when it fails, the secondary will notice this and take over the virtual IP and the tasks of the primary node that has failed, and become the new primary node. UCARP/VRRP type broadcast traffic is used to send out a heartbeat signal to indicate that things are running fine. Once that heartbeat signal stops because for example the primary node goes offline for whatever reason, then the failover standby node will take over instead. You can use port forwarding or open ports in a firewall or Internet gateway device to redirect traffic coming in from the Internet to the shared virtual IP address. Whichever of the two nodes is active at any given time, will be reachable on that virtual shared IP.

The currently active node is the primary node and handles all traffic and connections coming in on the shared virtual IP. It sends a heartbeat signal on to the network to indicate everything is okay, which the failover node is listening for. The failover node is just standing by until there is a disruption in this signal, and is ready to take over the shared virtual IP. Every few minutes or so, the primary node contacts the failover node and sends a copy of the latest user properties database information and certificates database information and other configuration settings. The failover node stores this in a temporary folder until it is needed. When the primary node fails, the heartbeat signal stops, and after about half a minute or so, the failover node takes the copy of the latest configuration data and loads it and then restarts the Access Server service and takes over the shared virtual IP. The failover node now becomes the primary node instead. It takes over the role fully.

If for whatever reason the original primary node now comes back (a reboot perhaps) it will notice that there is a primary node on the network already and that it has taken over the virtual shared IP. It will then take on the role of the failover node instead. To force them back take the node that is now functioning as the primary node offline for a minute. The failover node will then back the primary node again by itself.

Platform compatibility

This method unfortunately does not work on all platforms. For example on Amazon AWS, broadcast UCARP/VRRP traffic is simply filtered away. So this model cannot be used on Amazon AWS.

On HyperV and on ESXi you may need to dig into the settings to enable MAC address spoofing and/or Promiscuous mode. These are the methods used to assume an IP address on a network for the shared virtual IP used by the automatic failover system, and this type of traffic can be blocked by default on some systems. It needs to be allowed for this function to work.

Also, as mentioned, UCARP/VRRP traffic is used. Not all networks allow this by default. If it doesn't, then you cannot use this failover method. And if it does, but there are also other UCARP/VRRP systems in use, or multiple such OpenVPN Access Server pairs are setup, you must ensure that each pair has its own UCARP/VRRP VHID, a unique ID number used in the heartbeat signals, to ensure they don't interfere with each other.

Setting up the primary node

This part is the same as setting up a normal OpenVPN Access Server installation on a private network. Once a standard OpenVPN Access Server installation has been done, you are ready add a second node to it.

Setting up the failover node

The installation wizard that comes up on our appliances and virtual images may ask you if the installation you are adding is meant to be a failover node or a primary node. If you do not see the wizard because you installed the package manually or for some reason it has already been completed or skipped, you can reconfigure the node for failover operation manually using ovpn-init. Please note that this wipes the current configuration of the OpenVPN Access Server installation you intend to use as a secondary node.



How to: set up basic server load-balancing/redundancy

OpenVPN Access Server has a built-in UCARP function for redundancy. Please note that this requires VRRP/UCARP broadcast traffic, and that this type of traffic is blocked entirely on Amazon AWS. Therefore this type of failover setup cannot be used on Amazon AWS. On a hypervisor like ESXi or Hyper-V you will likely need to allow MAC address spoofing or promiscuous mode for this to work.

Setting up a failover setup requires setting up 2 Access Servers, a primary and secondary node (we will issue a secondary license key for the failover node at no charge upon request). This failover setup is similar to VRRP/HSRP so both servers need to be in the same private subnet and each have their own private IP address, and will on top of that 'share' a virtual IP which you should use in production. Once both servers are up and have Acesss Server installed, SSH into the secondary node, obtain root access and enter the command ovpn-init to start the process of reconfiguring it to work as a failover node:

root@openvpn:~# ovpn-init
Detected an existing OpenVPN-AS configuration.
 Continuing will delete this configuration and restart from scratch.

Please enter 'DELETE' to delete existing configuration: DELETE

Agree to the terms of usage with yes, then you will be asked if this will be the primary node - enter no:

Will this be the primary Access Server node?
 (enter 'no' to configure as a backup or standby node)
 Press ENTER for default [yes]: no

After this the setup wizard for a secondary node will begin, follow the prompts.  Then open a web browser and go to the admin UI of the primary node. Use its private IP address to contact it directly ( for example).

In the Admin UI click in the menu on the left on Failover.

Select the radio button for LAN model (UCARP).  Enter the shared virtual IP address that you will be using, enter the primary node information, including username and SSH password, then repeat for the secondary node.  Remember, all of these IP addresses have to be in the same subnet.
Click Validate at the bottom of the screen to test communication of the nodes.

Messages will appear at the top of the screen, if all are green and say GOOD then you are ready to move on.

Click Commit and Restart at the bottom of the screen to commit the changes.  The server will restart and connectivity will be lost for a few seconds.  If using an IP address and not a DNS registered domain name, you will have to enter the VIP in order to access the admin UI, this will be the new server address.  NOTE: in this scenario the VIP will also need to be added to Server Network Settings.  If using a DNS name that points to the new VIP this is not necessary.

And that is it!  Your failover is set up.

More Options

Load-balancing is a different subject and a difficult one.  Running multiple servers in different locations can be challenging. There are three problems to tackle in this case, two of which are easy to resolve.

Configure VPN clients to try a list of servers

By default, OpenVPN Access Server generates the client config files automatically and pushes them to the clients. So in order to make VPN clients connect to multiple different servers we'll need to do some tweaking on the OpenVPN Access Server itself. The server uses the value entered in the 'hostname or IP address' field found in the web based Admin UI to generate client config files, but this limits us to just one server address. In order to list more, we're going to use the 'Client Config Directives' override field in the web based Admin UI to override it with a list of servers we provide ourselves. Using the directive 'remote-random' we can force the VPN client to randomize the list so you can create a crude form of load-balancing, otherwise it's just sequentially from top to bottom.

Use RSYNC to copy user database and certificates

Let's say we have 2 servers and one of them is the 'master' server where we add users and let OpenVPN Access Server generate certificates on, and the other is there for load-balancing/redundancy. Using RSYNC you could easily and securely periodically copy the two important files that contain the list of users and the list of certificates from the master server onto the slave server. This assumes by the way that you are using authentication mode 'LOCAL', where the username/password is stored in a local SQLite database file. The two files you'll want to copy then are: