Deploying OPNSense to no-name WAN router hardware

Jan 10, 2020 · 1353 words · 7 minute read

In this article, I will discuss my recent experience replacing an older SOHO WAN router running OPNSense with a newer one.

Problem summary. I live and work in a remote area, which does not have the most reliable power and internet connections. I’ve addressed the intermittent power outages by installing a battery back-up system which covers the server cabinet and various other circuits in the house. I’ve addressed the intermittent network issues by deploying a multi-WAN router, which is what this article is about.

The WAN router also provides me with an extra NAT layer between my home/work network and the Internet, as you never know if you can trust ISP-provided routers. Additionally, it provides me and members of my team with OpenVPN access to resources on the network when we’re out and about, and much more besides.

The first router I bought was an Alix 2D3 (pictured below), which is based on AMD Geode LX800 with 256MB of RAM. It ran pfSense happily for the first few years, until one day earlier in 2019 when I went to upgrade it and found that pfSense stopped providing and supporting 32-bit builds . At this point, I switched to using OPNSense , and found that it did just as good a job as pfSense at all the things I needed it for, and allowed me to get some extra usage out of the old Alix router.

Towards the end of the 2019, I started having problems where the Alix-based unit would crash overnight and I’d have to reboot it almost daily each morning when starting work. I decided to look for a slightly more powerful and reliable unit, more appropriate to my networking requirements. For various reasons, I ended up choosing a ’no name’ Chinese router , based on an Intel Celeron J3000 with 8GB of RAM, advertised as being compatible with pfSense/OPNSense, and would be rack-mountable so would fit better into my server rack next to my switch. I received it last week, and so this article covers how I went about replacing the Alix unit with it.

The first thing I did was to prepare a USB stick containing the latest distribution of OPNSense, and use it to deploy OPNSense to the unit’s 32GB SSD drive. This was fairly simple and painless, and on rebooting the unit was listening on IP address 192.168.1.1.

So, I went back to my laptop and configured a virtual IP on it, so that I could continue to configure it from there.

sudo ifconfig enp0s31f6:0 `192.168.1.99/24`

On the Alix, the NICs were labelled vr0, vr1 etc, but on the new unit they are labelled igb0, igb1 etc. so simply restoring the configuration in it’s entirety from the Alix unit wasn’t going to be trivial. Plus, the newer hardware had more ‘features’ I could make use of, such as hardware-level encryption for the OpenVPN traffic, so I wanted to take this opportunity to thoroughly review all the settings. In the end, I configured the main settings by hand, and used the ‘backup and restore’ facility of the OPNSense to restore specific chunks of the saved Alix configuration, before checking and updating them accordingly.

Once I was happy that the settings on the new unit closely matched the settings on the old unit, I took the new unit from the workshop, and installed it into the server cabinet. In doing so, I removed a redundant second Zyxel GS2210-24 managed switch I no longer use and reconnected all the devices to the first switch. This made room for the new router directly below the first switch.

At this point, to avoid conflicts and confusion, I disconnected the Alix router, connected up the new router and booted it up. The good news was that all the main (wired) devices in the office and around the house were working as normal. The bad news was that the wifi devices were not.

On my home network, I have various Ubiquiti wifi devices. I have three NanoStations for ‘back haul’ links running in ‘Bridge’ mode, with one mounted externally on the main house, and the other two on nearby houses which connect those buildings back to the main network. Then, at each site I have one or more Unifi AC access points, which provide wireless access to clients at those sites - typically mobile phones. As a side note, I also have a couple of Unifi G3 web cameras connected. All the Ubiquiti devices are powered via PoE, and to avoid running lots of ‘bricks’, I have them connected to a 5-port Ubiquiti ToughSwitch . For completeness, I should add that the Unifi wifi access points are managed by the excellent Unifi management software, which I run in Container Station on the QNAP NAS in the server cabinet. The key thing to note here is that the access points are configured so that client traffic is carried on VLAN 999.

Why? Because my kids and their friends like to install all sorts of stuff on their phones, and because I don’t have much faith in security with mobile phones, so I like to keep that traffic separate from my work-related traffic for some peace of mind.

So, the symptom of the problem was that wifi devices were associating fine with the access points, but failing to retrieve an IP address from the DHCP server on the OPNSense. So the first thing to check is that the VLAN configuration and DHCP server settings are as expected.

The VLAN was configured correctly.

And, the DHCP server was configured to run on the interface.

According to the DHCP logs and the leases table, the DHCP requests were being received and DHCP responses with IP addresses were being issued to connecting clients. I ran a packet dump on the WIFI interface on the router, and checked it with Wireshark, which confirmed this.

An equivalent packet dump running on my laptop suggested that it wasn’t seeing the DHCP offers.

In the end, I discovered that it wasn’t related to the OPNSense configuration at all. It was related to the Zyxel managed switch port that the ToughSwitch, and thus the Unifi access points and back-haul links were connected to for their PoE power. The ToughSwitch was connected to a port on the second switch before I removed it, and I connected it to what I thought was an equivalent port on the first switch. On closer inspection, the new port I had connected it to had previously been configured with static VLAN settings that were preventing it from transmitting traffic for VLAN 999, so the DHCP responses were effectively being black-holed.

The following shows the static port settings for VLAN 999 on the Zyxel switch.

I had the ToughSwitch connected to port 27. After moving it to port 25, wifi DHCP leases started being received by clients as expected.

After this, the next thing I needed to check was that remote users could connect back via OpenVPN. The WAN router is a ISP-provided ZTE router, which I have configured with a LAN IP address of 172.16.1.1/24. It connects directly to the OPNSense router, which is configured with a WAN IP address of 172.16.1.2/24. The ZTE router has the OPNSense WAN IP listed as a DMZ, so inbound UDP traffic for port 1194 is sent directly to the OPNSense, and thus the OpenVPN service. All that’s needed on the OPNSense, other than the OpenVPN server configuration itself, is a firewall rule to allow the UDP packets in. Nothing too complicated there.

The OpenVPN service is configured to use ‘SSL/TLS and User Auth’ for authentication, so I ensured my user password was set as expected, and issued myself a new TLS client certificate, as my previous one had expired. I tried a few test connections, and it worked fine.

As a side note, I recently started using the Hashicorp Vault as a CA for issuing internal TLS certificates, and it’s good. Combined with ‘ cert-manager ’, it helps me simplify and automate a few local PKI tasks. It has Terraform integration which helps with general infrastructure secret management. Definitely worth looking into.