K3s — nerdctl [Field Test 1]
Intro
Those who have been following my last two posts where I cover my journey using K3s on a Raspberry Pi 4 may have noticed that the time of posts is somewhat ad-hoc. The backstory to this project was figuring out how to make an IoT edge node that can use K3s and a distributed network for managing software delivery. Last month, I travelled (yes, I know COVID is a thing, but work needs to carry on) to test what issues I would encounter on deploying this setup remotely.
While there are studies and other postings that describe remote setups, my background from working in academia encountered studies that didn’t fully document situations encountered, leading to surprises. I can identify the surprises and fill in that knowledge gap by going through the process myself. Since the application domain happens to be edge IoT, the challenges of fixing the surprises are increasingly tricky once the solution has been deployed into the wild.
The portable IoT edge node used in this test is shown below. The IoT edge node is composed of a Raspberry Pi 4. The initial design utilized a simple 5-port TP-Link network switch to allow the expansion of more nodes or other wired network devices. For this test, one of the switch ports was connected directly to the provided LTE/4G modem located at the site. The power to the Raspberry Pi and TP-Link switch is supplied by an Anker PowerPort 6.
Some assumptions on the trip were made before remote testing. The list below outlines the assumptions.
- Internet connectivity at the controlled remote site would be guaranteed
- The newly purchased portable desktop PC used to help develop the K3s solution would contain resilient hardware… you know if this is on the list something happened
- All hardware required was packed for the test.
Internet Connectivity
The controlled location site was equipped with a 4G/LTE modem as the primary source of internet access for the Raspberry Pi 4 test. Other people at the site shared the 4G/LTE modem for their work. However, as testing went on, the bandwidth became saturated, leading to degraded access for the others to conduct their work. Eventually, using the premises’ 4G/LTE modem would not be ideal for the test.
An alternative approach was developed utilizing a Google Pixel 5 with a USB-C ethernet adapter. The Pixel 5 would now replace the 4G/LTE modem as the primary source of internet access. The Pixel 5 was connected to a 5-port switch. The portable desktop PC and Raspberry Pi was connected to this switch.
Unfortunately, the advertised speed of the network from the vendor and the actual speed hugely differed. The actual speed was severely low, which impacted the time to download packages for clean build test cycles. The download speed also affected the time to rebuild systems in the event of a crash or a new clean setup for testing.
Hardware Failure
A portable desktop PC was brought along on the trip to help with development efforts since I was working on other projects that required an X86_64 platform. The site had supplied HDMI monitors to attach a display to the portable desktop. The portable desktop was plugged into the network switch, and SSH was used to perform tasks on the Raspberry Pi.
The specifications of the portable desktop PC are as follows:
- GIGABYTE BRIX GB-BRR7H-4800-BWUS Ultra Compact PC
- G.SKILL Ripjaws Series 32GB (2 x 16GB) 260-Pin DDR4 SO-DIMM DDR4 3000 (PC4 24000) Laptop Memory Model F4–3000C16D-32GRS
- ADATA XPG SX6000LNP
Despite efforts to mitigate hardware failure, they will still happen. The portable desktop PC had Windows 10 and Debian Buster installed with a dual boot configuration. During development time, the portable desktop PC was running Windows 10. While doing some development work, the PC crashed and blue screened. Upon restarting, the booting of the system took an exceptionally long time to recover, and power cycled numerous times. When the PC eventually recovered, the grub menu selector was gone, and the PC would only boot into Windows 10, access to Debian was now gone. A portion of the work was developed on the Debian side. Rebuilding the work done on the Debian side under Windows took a considerable amount of time due to the internet connectivity. I forgot to pack a USB key that may have helped booting into Linux to fix the grub boot menu selector.
Inventory issues
One constraint with this test was the number of items to carry to the remote location for testing. Since the location was in another country, airline travel was used. The luggage constraints played a role in what items could be carried to the testing site. A minimalist packing strategy was used. Compounding the issue was the avoidance of depending on access to stores in the remote location due to COVID and inventory supply chain issues.
One setback to the testing was reconfiguring the network once the primary internet source became an issue. The USB-C ethernet adapter had to be purchased, which took time to get to a store to acquire. This delay meant the testing had to be halted. Additional cable was needed for the portable desktop PC for display connections to the provided monitor at the remote location. Eventually, these minor things took away time to devote to the testing.
Outro
Despite some setbacks with the initial testing, there were many learning opportunities when carrying out remote tests. Even though the selected remote location does have better than average internet coverage in global rankings, internet connectivity issues can still exist. Four immediate measures will be utilized in the next round of remote testing. The First will be moving the network gateway dependency away from the provided infrastructure. Second, setting up a remote caching system on-prem to hold images for local builds and testing. Third, better inventory management before travelling. Finally, install a working Wi-Fi adapter on all devices. However, as this solution scales, it would be wise to have a wired network connection between each k3s node.