Copyright © https://mongoose-os.com

Mongoose OS Forum

frame

Thing Shadow Delta Event Stops Working

I am running into an issue where AWS Delta events randomly stop coming through and when this happens I have to wait until MQTT times out (10 min) and reconnects to AWS in order to receive Delta events. While in this state I am able to publish to the shadow, and I receive Accepted events, however I never see delta events.

Comments

  • SergeySergey Dublin, Ireland

    Can you publish a minimal reproducible code for that please?

  • jonwjonw USA
    edited November 28

    At this link is a zip of the project folder including the built firmware, the source, and the fs. This is the minimal reproducible code.

    https://gridconnect.box.com/shared/static/a6w8czyr8sohw3ntkst7pb7lnf2fhsyi.zip

    To recreate the problem I am coming across, update the desired state with 30 new KVP, 3 or 4 times and you will see that the device will stop receiving delta events, and there will be an unresolved delta field in the shadow. After waiting 10 min in this state it will time out and re connect to AWS. Then it will receive the unresolved delta state.

  • I am aware that 30 new KVP is a lot, however it is the quickest way to reproduce the problem and it is under the payload limit that an AWS shadow can support.

  • SergeySergey Dublin, Ireland

    That code is way above the "quick look" threshold, more like a code review :)
    Make the logic to only do a minimum - set reported = desired, and see if that still leaves a delta.

  • Revised code is at the following link. All i do on delta events is set reported = desired. I removed all other code functionality. It does not get simpler than this. Still have the same issue.

    https://gridconnect.box.com/shared/static/44u5v81natahl3fi69aw9zl9rqz1pojx.zip

  • SergeySergey Dublin, Ireland

    Reproduced that, looking what's wrong.

  • SergeySergey Dublin, Ireland
    edited December 1

    Weird thing, but it looks like when the shadow size is close to the limit,
    and some shadow changes are performed, the aws stops sending data. No logs from mqtt or even SSL layers are seen. Reconnect fixes that.
    I could not believe that's an aws issue, so looking further.

  • SergeySergey Dublin, Ireland

    Could you throw this in your mos.yml and rebuild please

    cdefs:
      MG_LWIP_SSL_RECV_MBUF_LIMIT: 8192
    
  • Sergey,

    What is the AWS shadow size limit? Would that be a limit of AWS or mongoose?

    I am trying your fix now

  • jonwjonw USA
    edited December 1

    The cdef seems to do the same thing after 3 or 4 times of adding 30 new KVP to the desired field. We plan on having a large shadow for our application. Can you provide max limits if they exist for shadow implementation?

    Best Regards
    Jonathan Witthoeft

  • SergeySergey Dublin, Ireland
    edited December 2

    AWS shadow size limit is 8k, so you can't have a large shadow. You can have multiple shadows though, but that severely complicates the design. Mongoose OS does not provide any helper API for the multiple shadows per thing, you'd need to do everything manually.

    The limit of the IO buffer I suggested is equal to the AWS shadow maximum.

    Did you do a clean rebuild (--clean option to mos build command) ?

  • SergeySergey Dublin, Ireland

    Update to the latest, rebuild. It should be fixed now.

  • Updated to the latest libs and it won't build:

    /commercial-libs/http-server/src/mgos_http_server.c:280:36: error: 'MGOS_RECV_MBUF_LIMIT' undeclared (first use in this function)
    s_listen_conn->recv_mbuf_limit = MGOS_RECV_MBUF_LIMIT;
    ^

  • SergeySergey Dublin, Ireland

    Did you update mongoose-os sources too ?

  • jonwjonw USA
    edited December 4

    Yes I did.

    The most current release of the libraries builds, but the recent commits do not. If I were to cdef the MGOS_RECV_MBUF_LIMIT, it will build, but i don't think that is what I am supposed to do.

  • Something else worth mentioning is that after it breaks, times out, and reconnects, I can shrink the shadow size to around 2000 bytes and still reproduce the error by writing about 10 values. It doesn't seem to coordinate with AWS shadow limits at that point. I have changed the value of only a few KVP and gotten no response.

  • SergeySergey Dublin, Ireland

    That's weird. I was able to repro the issue, then after the fix, it works for me.
    The issue was that Mongoose was capping the input buffer size to 3k - therefore the big shadow was buffering the first 3k, then was infinetely waiting for the rest of the shadow which was never received cause the buffer was full and silently did not accept any data.

    Now,
    1. the buffer size is configurable via the MQTT setting
    2. Set to 8k on esp32 - which should buffer the entire shadow of maximum size
    3. if buffer is maxed and not drained, however, Mongoose logs an error and drops a connection - so the log must show something.

    That said,
    1. could you show your mos config-get mqtt
    2. could you show your level 3 log when the issue happens

    Thanks!

  • jonwjonw USA
    edited December 6

    Ok so The following log shows my last successful update of 30 kvp, it reported fine, but immediately sent ping requests, timed out and reconnected. During that time I updated the desired state again and never received the delta state because for some reason the shadow was unreachable

    [Dec 6 15:24:13.630] mgos_mqtt_ev MQTT event: 203
    [Dec 6 15:24:13.631] mgos_aws_shadow_ev Topic $aws/things/Rollease/shadow/update/delta (7), id 0x0001, token , payload:
    [Dec 6 15:24:13.631] {"version":39196,"timestamp":1512599053,"state":{"test41":4312,"test42":4312,"test43":4312,"test44":4312,"test45":4312,"test46":43122,"test47":4312,"test48":4312,"test49":4312,"test410":4321,"test411":4321,"test412":4321,"test413":4321,"test414":4321,"test415":4321,"test416":4321,"test417":4321,"test418":4321,"test419":4321,"test420":4321,"test421":4312,"test422":4312,"test423":4312,"test424":4312,"test425":4312,"test426":43122,"test427":4312,"test428":4312,"test429":4312,"test430":1},"metadata":{"test41":{"timestamp":1512599053},"test42":{"timestamp":1512599053},"test43":{"timestamp":1512599053},"test44":{"timestamp":1512599053},"test45":{"timestamp":1512599053},"test46":{"timestamp":1512599053},"test47":{"timestamp":1512599053},"test48":{"timestamp":1512599053},"test49":{"timestamp":1512599053},"test410":{"timestamp":1512599053},"test411":{"timestamp":1512599053},"test412":{"timestamp":1512599053},"test413":{"timestamp":1512599053},"test414":{"timestamp":1512599053},"test415":{"timestamp":1512599053},"test416":{"timestamp":1512599053},"test417":{"timestamp":1512599053},"test418":{"timestamp":1512599053},"test419":{"timestamp":1512599053},"test420":{"timestamp":1512599053},"test421":{"timestamp":1512599053},"test422":{"timestamp":1512599053},"test423":{"timestamp":1512599053},"test424":{"timestamp":1512599053},"test425":{"timestamp":1512599053},"test426":{"timestamp":1512599053},"test427":{"timestamp":1512599053},"test428":{"timestamp":1512599053},"test429":{"timestamp":1512599053},"test430":{"timestamp":1512599053}}}
    [Dec 6 15:24:13.769] mgos_aws_shadow_ev Version: 0 -> 39196 (7)
    [Dec 6 15:24:13.769] mgos_aws_shadow_ev No state handler, message ignored.
    [Dec 6 15:24:13.769] mgos_aws_shadow_ev Topic $aws/things/Rollease/shadow/update/delta (7), id 0x0001, token , payload:
    [Dec 6 15:24:13.770] {"version":39196,"timestamp":1512599053,"state":{"test41":4312,"test42":4312,"test43":4312,"test44":4312,"test45":4312,"test46":43122,"test47":4312,"test48":4312,"test49":4312,"test410":4321,"test411":4321,"test412":4321,"test413":4321,"test414":4321,"test415":4321,"test416":4321,"test417":4321,"test418":4321,"test419":4321,"test420":4321,"test421":4312,"test422":4312,"test423":4312,"test424":4312,"test425":4312,"test426":43122,"test427":4312,"test428":4312,"test429":4312,"test430":1},"metadata":{"test41":{"timestamp":1512599053},"test42":{"timestamp":1512599053},"test43":{"timestamp":1512599053},"test44":{"timestamp":1512599053},"test45":{"timestamp":1512599053},"test46":{"timestamp":1512599053},"test47":{"timestamp":1512599053},"test48":{"timestamp":1512599053},"test49":{"timestamp":1512599053},"test410":{"timestamp":1512599053},"test411":{"timestamp":1512599053},"test412":{"timestamp":1512599053},"test413":{"timestamp":1512599053},"test414":{"timestamp":1512599053},"test415":{"timestamp":1512599053},"test416":{"timestamp":1512599053},"test417":{"timestamp":1512599053},"test418":{"timestamp":1512599053},"test419":{"timestamp":1512599053},"test420":{"timestamp":1512599053},"test421":{"timestamp":1512599053},"test422":{"timestamp":1512599053},"test423":{"timestamp":1512599053},"test424":{"timestamp":1512599053},"test425":{"timestamp":1512599053},"test426":{"timestamp":1512599053},"test427":{"timestamp":1512599053},"test428":{"timestamp":1512599053},"test429":{"timestamp":1512599053},"test430":{"timestamp":1512599053}}}
    [Dec 6 15:24:13.953] mgos_aws_shadow_ev Version: 0 -> 39196 (7)
    [Dec 6 15:24:13.954] aws_shadow_state_han == Event: 5 (UPDATE_DELTA), version: 39196
    [Dec 6 15:24:13.954] aws_shadow_state_han {reported:{"test41":4312,"test42":4312,"test43":4312,"test44":4312,"test45":4312,"test46":43122,"test47":4312,"test48":4312,"test49":4312,"test410":4321,"test411":4321,"test412":4321,"test413":4321,"test414":4321,"test415":4321,"test416":4321,"test417":4321,"test418":4321,"test419":4321,"test420":4321,"test421":4312,"test422":4312,"test423":4312,"test424":4312,"test425":4312,"test426":43122,"test427":4312,"test428":4312,"test429":4312,"test430":1}}
    [Dec 6 15:24:14.008] ssl_socket_recv 0x3ffc0d7c <- 5
    [Dec 6 15:24:14.008] ssl_socket_recv 0x3ffc0d7c <- 1431
    [Dec 6 15:24:14.009] ssl_socket_recv 0x3ffc0d7c <- 1436
    [Dec 6 15:24:14.009] ssl_socket_recv 0x3ffc0d7c <- 1436
    [Dec 6 15:24:14.009] mgos_aws_shadow_ev Update: {"state": {"reported":{"test41":4312,"test42":4312,"test43":4312,"test44":4312,"test45":4312,"test46":43122,"test47":4312,"test48":4312,"test49":4312,"test410":4321,"test411":4321,"test412":4321,"test
    [Dec 6 15:24:14.053] mgos_mqtt_pub Publishing to $aws/things/Rollease/shadow/update @ 1 (492): [{"state": {"reported":{"test41":4312,"test42":4312,"test43":4312,"test44":4312,"test45":4312,"test46":43122,"test47":4312,"test48":4312,"test49":4312,"test410":4321,"test411":4321,"test412":4321,"test413":4321,"test414":4321,"test415":4321,"test416":4321,"test417":4321,"test418":4321,"test419":4321,"test420":4321,"test421":4312,"test422":4312,"test423":4312,"test424":4312,"test425":4312,"test426":43122,"test427":4312,"test428":4312,"test429":4312,"test430":1}}, "clientToken": "500c5b77"}]
    [Dec 6 15:24:14.105] ssl_socket_send 0x3ffc0d7c 570 -> 570
    [Dec 6 15:24:14.105] ssl_socket_recv 0x3ffc0d7c <- 1436 [Dec 6 15:24:14.105] ssl_socket_recv 0x3ffc0d7c <- 1436 [Dec 6 15:24:14.105] ssl_socket_recv 0x3ffc0d7c <- 971 [Dec 6 15:25:14.113] mqtt_handler Send PINGREQ [Dec 6 15:25:14.114] ssl_socket_send 0x3ffc0d7c 31 -> 31
    [Dec 6 15:26:14.123] mqtt_handler Send PINGREQ
    [Dec 6 15:26:14.123] ssl_socket_send 0x3ffc0d7c 31 -> 31
    [Dec 6 15:27:14.132] mqtt_handler Send PINGREQ
    [Dec 6 15:27:14.132] ssl_socket_send 0x3ffc0d7c 31 -> 31
    [Dec 6 15:27:14.211] ssl_socket_send 0x3ffc0d7c 31 -> -1
    [Dec 6 15:27:14.236] mgos_mqtt_ev MQTT Disconnect
    [Dec 6 15:27:14.237] mqtt_global_reconnec MQTT connecting after 1800 ms
    [Dec 6 15:27:16.026] mgos_mqtt_global_con MQTT connecting to adefh4aq0py0s.iot.us-east-1.amazonaws.com:8883
    [Dec 6 15:27:16.083] find_mount_by_path 71a8c1922a-certificate.pem.crt -> /71a8c1922a-certificate.pem.crt pl 1 -> 1 0x3ffc0eac
    [Dec 6 15:27:16.084] mgos_vfs_open 71a8c1922a-certificate.pem.crt 0x0 0x1b6 => 0x3ffc0eac 71a8c1922a-certificate.pem.crt 1 => 257 (refs 1)

  • Sergey,

    I ran the firmware over night while logging and this is my results:

    [Dec 6 15:40:39.681] mqtt_handler Send PINGREQ
    [Dec 6 15:40:39.682] ssl_socket_send 0x3ffca324 31 -> 31
    [Dec 6 15:40:39.745] ssl_socket_recv 0x3ffca324 <- 5
    [Dec 6 15:40:39.746] ssl_socket_recv 0x3ffca324 <- 26
    [Dec 6 15:40:39.768] mgos_mqtt_ev MQTT event: 213
    It does this every minute for the next 5 hours and 16 minutes

    [Dec 6 20:56:36.155] mgos_mqtt_ev MQTT Disconnect
    [Dec 6 20:58:13.490] mgos_mqtt_ev MQTT Disconnect
    It disconnects once every minute and a half for 49 minutes

    [Dec 6 21:45:15.806] mqtt_global_reconnec MQTT connecting after 1837 ms
    This was the last debug message. The firmware is completely unresponsive and there was no crash report

    Before this log, firmware was running for about 25 minutes, which means at about 6 hours and 30 minutes this firmware crashes without reason.

  • Here is my mos config-get mqtt
    Any update on this issue?

    {
    "clean_session": true,
    "client_id": "",
    "enable": true,
    "keep_alive": 60,
    "max_qos": 2,
    "pass": "",
    "reconnect_timeout_max": 60.000000,
    "reconnect_timeout_min": 2.000000,
    "recv_mbuf_limit": 8192,
    "server": "adefh4aq0py0s.iot.us-east-1.amazonaws.com:8883",
    "ssl_ca_cert": "ca-verisign-g5.crt.pem",
    "ssl_cert": "71a8c1922a-certificate.pem.crt",
    "ssl_cipher_suites": "",
    "ssl_key": "71a8c1922a-private.pem.key",
    "ssl_psk_identity": "",
    "ssl_psk_key": "",
    "user": "",
    "will_message": "",
    "will_topic": ""
    }

  • SergeySergey Dublin, Ireland

    Please set the debug level to 3 and show the logs

  • My first message on December 6 was the log at a debug level of 3

Sign In or Register to comment.