Linux 4G LTE Failover

In my last post I went over how to setup an WiFi/Ethernet bridge on a RaspberryPi for use when your main ISP goes down. In this post I’ll be going over 4G failover with a USB dongle on a Linux server. I won’t be using a RaspberryPi for this since I want it to be 100% automatic and only on a single server. You could do this on a RaspberryPi, but I don’t want it to eat up all the data by having the entire house on a 4G connection.

Requirements

  1. ZTE MF833V or equivalent
  2. Debian 10

My Setup

I’m using the USB device listed above along with a Supermicro server that has two bonded/bridged Ethernet ports. If I were to use a RaspberryPi I would use one of the Ethernet ports with the Pi and the other for my main network and let the Pi handle most of the failover (xfinitywifi failover / iPhone tethering).

Setup

  1. The ZTE device appears as a CD-ROM drive which contains the Windows drivers. Because of this it needs to be switched into the USB modem mode (or rather disable switching into CD-ROM mode).
  2. Plug in the device and find the interface name with ifconfig -a
  3. Configure the interfaces file
    • allow-hotplug enp0s20u8
      iface enp0s20u8 inet static
         address 192.168.0.100
         netmask 255.255.255.0
         dns-nameservers 1.1.1.1 1.0.0.1 8.8.8.8
      

Failover

You’ll notice that the USB device was not given a default gateway. This is to prevent issues with the interface not starting up. You can use post-up commands to set it with a different metric, but that will not auto failover when the main network cannot communicate with the internet. A script to test the connection and switch it over is required. There might be a better way to do this, but it seems to work fine for my purposes.

This script was taken from here and modified to better suit my needs. What it does is do two ping tests and if both fail it will switch to the backup gateway if it is not currently set to that. If both ping tests succeed it will switch back to the default gateway if it is not already set. I tweaked it a little and added a Pushover notification so I’ll know when something fails.


/opt/failover.sh

#!/bin/bash
#*********************************************************************
#       Configuration
#*********************************************************************
DEF_GATEWAY="10.13.37.1"        # Default Gateway
BCK_GATEWAY="192.168.0.1"       # Backup Gateway
RMT_IP_1="1.1.1.1"              # First remote ip
RMT_IP_2="8.8.8.8"              # Second remote ip
PING_TIMEOUT="3"                # Ping timeout in seconds
CURL_TIMEOUT="5"                # Pushover timeout
#*********************************************************************

if [ `whoami` != "root" ]
then
        echo "Failover script must be run as root!"
        exit 1
fi

CURRENT_GW=`ip route show | grep default | awk '{ print $3 }'`

if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
then
        ping -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
        PING_1=$?
        ping -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
        PING_2=$?
else
        ip route add $RMT_IP_1 via $DEF_GATEWAY
        ip route add $RMT_IP_2 via $DEF_GATEWAY

        ping -c 2 -W $PING_TIMEOUT $RMT_IP_1 > /dev/null
        PING_1=$?
        ping -c 2 -W $PING_TIMEOUT $RMT_IP_2 > /dev/null
        PING_2=$?

        ip route del $RMT_IP_1
        ip route del $RMT_IP_2
fi

LOG_TIME=`date +%b' '%d' '%T`

if [ "$PING_1" == "1" ] && [ "$PING_2" == "1" ]
then
        if [ "$CURRENT_GW" == "$DEF_GATEWAY" ]
        then
                ip route del default
                ip route add default via $BCK_GATEWAY
                ip route flush cache

                echo "$LOG_TIME: $0 - switched Gateway to Backup with IP $BCK_GATEWAY"

                curl -m $CURL_TIMEOUT -s \
                --form-string "token=" \
                --form-string "user=" \
                --form-string "message=Failing to 4G LTE" \
                https://api.pushover.net/1/messages.json > /dev/null
        fi
elif [ "$CURRENT_GW" != "$DEF_GATEWAY" ]
then
        ip route del default
        ip route add default via $DEF_GATEWAY
        ip route flush cache

        echo "$LOG_TIME: $0 - Gateway switched to Default with IP $DEF_GATEWAY"

        curl -m $CURL_TIMEOUT -s \
        --form-string "token=" \
        --form-string "user=" \
        --form-string "message=Network online" \
        https://api.pushover.net/1/messages.json > /dev/null
fi

I decided to rewrite it in Python and support infinite gateways. The gateways will be used in the order that they are listed.

/opt/failover.py

#!/usr/bin/env python3
import os
import re
import requests
import subprocess

PUSHOVER_USER = ''
PUSHOVER_APP = ''

PING = ['1.1.1.1', '8.8.8.8']
GATEWAY = ['10.13.37.1', '192.168.0.1', '172.22.0.1']

PING_COUNT = 2
PING_TIMEOUT = 2
PUSHOVER_TIMEOUT = 10.0

def pushover(message):
    if PUSHOVER_USER and PUSHOVER_APP:
        params = {
            'token': PUSHOVER_APP,
            'user': PUSHOVER_USER,
            'message': message
        }

        try:
            requests.post('https://api.pushover.net/1/messages.json', params=params, timeout=PUSHOVER_TIMEOUT)
            return True
        except:
            pass
        return False

def get_default_gateway():
    try:
        data = subprocess.Popen(['ip', 'route', 'show'], stdout=subprocess.PIPE).communicate()[0].decode('utf-8')
        m = re.match(r'default via (\d*\.\d*\.\d*\.\d*) dev', data)
        return m[1]
    except:
        return None

def set_default_gateway(ip):
    subprocess.call(['ip', 'route', 'del', 'default'], stdout=open(os.devnull, 'w'))
    subprocess.call(['ip', 'route', 'add', 'default', 'via', ip], stdout=open(os.devnull, 'w'))
    subprocess.call(['ip', 'route', 'flush', 'cache'], stdout=open(os.devnull, 'w'))

def add_route(ip, gateway):
    subprocess.call(['ip', 'route', 'add', ip, 'via', gateway], stdout=open(os.devnull, 'w'))

def remove_route(ip):
    subprocess.call(['ip', 'route', 'del', ip], stdout=open(os.devnull, 'w'))

def ping(ip):
    return (subprocess.call(['ping', '-c%d' % (PING_COUNT), '-W%d' % (PING_TIMEOUT), ip], stdout=open(os.devnull, 'w')) == 0)

def test_gateway(gateway):
    ok = False

    for ip in PING:
        add_route(ip, gateway)
        ok = ping(ip)
        remove_route(ip)
        if ok: break

    return ok

def main():
    current = get_default_gateway()
    for g in GATEWAY:
        if test_gateway(g):
            if g != current:
                set_default_gateway(g)
                print('Changing gateway to %s' % (g))
                pushover('Changing gateway to %s' % (g))
            break
    return 0

if __name__ == "__main__":
    os._exit(main())

Modify this service if you plan to use the Python script above.

/etc/systemd/system/failover.service

[Unit]
Description=failover

[Service]
Type=oneshot
ExecStart=/bin/bash /opt/failover.sh

/etc/systemd/system/failover.timer

[Unit]
Description=failover timer

[Timer]
OnUnitActiveSec=15s
OnBootSec=15s

[Install]
WantedBy=timers.target

Finally, enable the timer.

systemctl daemon-reload
systemctl start failover.timer && systemctl enable failover.timer
systemctl list-timers --all

Virtual Machines

I’m running a Windows virtual machine on this server for TradeStation algos. It must be configured to use NAT routing otherwise you will need two interfaces attached to the VM and let Windows figure it out. It will be much easier to use a NAT configuration.

If you also have a Windows VM be sure to set it as a “metered connection” otherwise it could download several gigabytes of updates while connected to 4G.

Testing

There are several ways I want to test this to make sure it fails over correctly.

  1. Unplug both Ethernet cables from the server.
  2. Unplug the internet connection from the router.
  3. Unplug the router itself.
  4. Unplug various switches the server is connected to.
  5. Use ifdown to disable the bond / bridge interfaces.

If all of these pass it should work when the network really goes out.

Comments