We have purchased a few gateways and a lot of current sensors from you. We have send everything to a mining client. We have noticed that the gateway keeps disconnecting after 1-2 days. It has happened 4 times in the last week or so. Once we notice the problem we simply unplug and plug it back in and it starts to work again. The client has reported 3 different status for the light on the gateway.
Client noticed the light was green on gateway but nothing was getting into AWS. Once client reset the gateway it fixed the issue
Another time client noticed the gateway light was off but it was plugged in. Once again resetting the gateway fixed the issue. I looked at all the light colors and statuses for the gateway and I could not find any which said light is off and its plugged in
Finally today client reported that the light was red. Which it means it cannot connect to WIFI. Once again resetting the gateway fixed the issue.
We cannot figure out what is causing this issue. Also I was hoping there might be a device update sent to AWS when it disconnects or connects to internet. But there seems to be no device shadow update send to AWS for connecting. Only status I see it when I plug it in or click the reset bottom it sends a device update to AWS.
I saw another item on this form about some MQTT gateway having connection issues after one day.
The solution was to update the firmware on the device. But this discussion was from 2019 which I assume the gateways we got already have the new firmware on them.
Please let me know what I can do to investigate more or to fix this.
I dont think the device shadow updates when it disconnects, only when you rest it. But I will look into it as I need a way to monitor it when it goes off cause its been happening a lot. This morning again it happened with the light off while the thing was plugged in. Client pulled the plug and pluged it back in and it worked. Also its not just one gateway that does that. Two of them are on at the same time. And they both have issues. Yes there is some connectivity blackouts at times around mining equipment. So once in a while client reported loosing connection on his phone for a few minutes or longer. But then it comes back. I have feeling once connection drops the gateway has issues trying to connect back to WIFI so it just hangs in there. I have tested the scenario in our office in Vancouver and I noticed when the connection is cut off. The light flashes white and as soon as it comes back the light goes off for about 4-8 seconds before it goes green. Somehow it seems like those two gateways get stuck in the state where its trying to connect back and the light goes off and it just gets stuck there. I’m not sure what else to do other than create a monitoring application that looks at either the logs or the device shadow to see when it stops sending signals to AWS. This does not fix the issue, all it does it allows me to notify the client that the gateway needs to be reset. Another short term fix could be they get a WIFI booster right next to the gateway but once again this is a quick fix and not permanent. This should not be happening, at least if its not connected to internet it should be flashing white or single red. Not green and not sending or off while its plugged in.
So the way the loop logic works in the Gateway looks something like this(sudo cod):
Check WiFi Connectivity, if WiFi is not connected then attempt to reconnect
Check MQTT Broker connectivity, if not connected then attempt to reconnect
Check for data from sensors and if any is received publish it to the MQTT Broker
As you said in your testing if you shut the WiFi network down the device gracefully recovers. I’m not sure what is going on at the customer site but something is preventing the device from connecting back to the network or from connecting back to the MQTT Broker.
As I said previously I do not have any reports of this type of behavior from any other customers utilizing the AWS Micro Gateways. We have sold hundreds of them and many are in 24/7 operation around the world without issue.
If there is a network admin for the WiFi network at the customer site I would recommend opening a dialog with them to determine what, if anything, could be preventing the device from recovering connectivity.
I just reviewed the initialization code that only occurs on boot vs the loop code that is executed continuously and I do not see anything there that would allow it to recover from a reboot rather than graceful recovery on loop.
The only thing I can think to add is a connectivity timeout to the firmware where the device would automatically reboot after a period of inability to reconnect. That said this is also a short term fix as you said and does not reveal the true underlying cause of the failure.
Thanks for your detailed message. So client is still seeing this connection issue happen once every week or two. Wanted to check few things:
What would be process of you adding the disconnect restart flow? You gonna generate a flash and I need to reset and upload to gateway?
Since client has hardline connection near the gateway. I was wondering would it be a big job for me to try to add Ethernet connection to gateway with some hardware mods? So sensors speak to gateway wirelessly and gateway pushing to AWS uses hard wired connectivity. What language your using for the firmware?
My last option is getting a WIFI booster near the gateway and see if that helps a lot. But from my experience with mining equipment and infrastructure sometimes WIFI becomes issue near the machine rooms et…
I just need to implement something for now until we try to explore other options. It just seems to disconnect at point and it just never reconnects event though the light is green. A simple power off and on always fixes the issue
I do not believe anything “meaningful” could be done to the AWS Gateway firmware to improve it’s reliability. I still have yet to hear about a failure such as this from any other customers.
I need more information if any can be provided in order to troubleshoot possible failures. Reading the logic of the firmware and testing here shows that the only way the LED can stay green is if the Gateway has a valid WiFi connection and a valid connection to the AWS MQTT Broker. I have found no possible outcome to break connection to AWS and maintain a Green status LED. It does not appear to be possible.
An alternative would be to swap out the AWS Gateway you have for an IoT Edge Computer which has more processing power as well as an Ethernet connection so you can eliminate WiFi if you feel that may be the issue. The IoT Edge Computer is available here: