Node-Red Wireless Gateway Sometimes goes 'Failed to Connect'

Hello,

We have several gateways running Node-Red with an NCD Xbee. Most of our gateways have been running fine for months/weeks without issue. Some of them, however will every few days lose communication with their xbee.

The loss of communication to the Xbee is evidenced by no sensor transmssions coming in.
Then I remote into the gateway, and the ‘Wireless Gateway’ shows ‘Ready’ until I move a node, make a deploy, whereupon it comes back as ‘Failed to Connect’.

Rebooting the gateway resolves the issue for some random interval of time. I’ve started needing to check them every 12 hours or so to try to just make sure they stay online while I try to figure out the root cause.

So that brings me to my question, does anyone have an idea as to what the root cause might be?
First thoughts were that maybe the Xbee was misconfigured, specifically regarding the Sleep setting via XCTU. We shipped a new gateway with that setting for sure set to be the same as a repeater, however that new gateway lasted one day before it encountered the same issue.

Once the Xbee goes ‘Failed to Connect’ is there anyway to restore it other than rebooting?
Is there any additional logging I can look at to try to alert/detect when the communication loss happens?
Seems like I can only detect it by the lack of sensor transmissions (interval 30m), but by then I am roughly 45-75 minutes late to the party.

image

One additional piece of information, some of these gateways have used Xbees that were formerly in Sensors, we reflashed them with XCTU with a profile that matched a Repeater. Could that have caused a problem?

xbee can be swapped between sensor and gateway as long as they are configured correctly.
@jacob can you look into this

@kstokes try loading this configuration into your module
900mhz_modem_module_recover.xpro (66.0 KB)

Based on the failure to connection it’s most likely the sleep functionality causing the issue.

if it was due to sleep then it won’t work after the gateway reboot.

@kstokes

I have heard of this a couple times in the past but it was always when communicating through an Ethernet XBee modem. I believe the fix for that was to assign the Ethernet Modem a static IP outside the DHCP range. The issue was with IP address lease renewal which broke the connection improperly and the TCP socket did not recover as the library did not know it closed.

So Just to confirm is this a Serial/USB modem or one of our IoT Edge Computers or is it an Ethernet Modem?

Sorry guys, I lost track of this with other stuff going on.

It is with the Serial USB Modem.
It is with either a disassembled repeater:
https://store.ncd.io/product/900hp-s3b-long-range-wireless-mesh-modem-with-usb-interface/

Or with a Waveshare Xbee adapter

I would not expect you guys to support it if its that Waveshare board and honestly we have not tracked which gateways have which USB Adapter for the Xbee. It is possible that all the ones having issues have Waveshares, and all the ones using NCD Boards are good.

With it being just Serial/USB is there any additional logging to turn up?

To any others that may come across this…

It seems as though our particular issue may be related to a .96" OLED mounted near the Xbee chip.
The OLED has a Charge Pump device (a capacitor perhaps? I am not an EE) that apparently can cause some good amount of interference when activated.

After a lot of dead ends in troubleshooting we finally remotely disabled the OLED on the worst offending units in our fleet, and lo and behold, not a single new error on those units after disabling the OLED. These were units that had been erroring (loosing serial comms to the Xbee) several times a day, and now we are at 3-4-5 weeks since change with no error.

Disabling the OLED’s leaves us without their functionaltiy of course, but was a good troubleshooting method. There may be other choices here, reducing the refresh rate of the OLED, better quality OLED, different physical mounting configurations or shielding internal to our devices.