Copyright © https://mongoose-os.com

Mongoose OS Forum

frame

ESP32 OTA fails (core dump)

MrZANEMrZANE Sweden
edited August 13 in Mongoose OS

Hi.
I'm trying to get OTA to work, initiated from AWS with data on local server (HTTP for now), but the update fails with a core dump.
The only difference from the code already running is that the period of which the led is blinking.
I've attached the fw.zip and a part of the core dump below
Thanks in advance
Jimmy

[Aug 13 10:17:03.805] updater_context_crea Starting update (timeout 300)
[Aug 13 10:17:03.815] mgos_ota_http_start Update URL: http://192.168.0.217:8080/fw.zip, ct: 60, isv? 0
[Aug 13 10:17:03.861] parse_manifest FW: AWS-test-1.13 esp32 1.0 20170813-080940/??? -> 1.0 20170813-080940/???
[Aug 13 10:17:03.872] mgos_upd_begin App: AWS-test-1.13.bin -> app_1, FS fs.img -> fs_1
[Aug 13 10:17:04.180] mgos_upd_file_begin Writing app image @ 0x1d0000
[Aug 13 10:17:06.558] assertion "p->tot_len == p->len + q->tot_len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 864, function: pbuf_dechain
[Aug 13 10:17:06.604] abort() was called at PC 0x4010b327 on core 0
[Aug 13 10:17:06.604]
[Aug 13 10:17:06.604] Backtrace: 0x400875b3:0x3ffd20f0 0x400872c3:0x3ffd2110 0x4010b327:0x3ffd2130 0x4010544f:0x3ffd2160 0x40137792:0x3ffd2180 0x40139771:0x3ffd21b0 0x400830e3:0x3ffd21e0 0x4015f635:0x3ffd2200
[Aug 13 10:17:06.605]
[Aug 13 10:17:06.605] --- BEGIN CORE DUMP ---
[Aug 13 10:17:06.605] {"arch": "ESP32", "cause":29,
[Aug 13 10:17:06.605] "REGS": {"addr": 1073539172, "data": "
[Aug 13 10:17:06.605] s3UIQMZyCIDwIP0/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAANAg/T8AAAAAgSD9Py8g/T8wAAAAAAAAADkg/T/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd5AThZAX04WQAAAAAAFAAAAAAAAAAEAAADvvq3e776t3iAFBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"},

Comments

  • MrZANEMrZANE Sweden

    I currently don't have a linux/MAC system so I don't think I can do make without cygwin or similar but I've attached the relevant files in hope that someone at Cesanta maybe could have a look at it.
    (Had to name the fw.zip fw-new.zip since the forum system didn't allow me to upload it else)
    Let me know if there is anything else you need.

  • seisei Japan

    Hi,
    I met the same error when posting repeatedly large (1kB) data to my ESP32 server.


    mgos_http_ev 0x3ffb475c HTTP connection from 192.168.1.77:58569
    assertion "p->tot_len == p->len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 881, function: pbuf_dechain
    abort() was called at PC 0x40109e5f on core 0

    Backtrace: 0x4008747f:0x3ffcab70 0x4008718f:0x3ffcab90 0x40109e5f:0x3ffcabb0 0x40103f6f:0x3ffcabe0 0x40135f6e:0x3ffcac00 0x401384f5:0x3ffcac30 0x4013907a:0x3ffcaca0 0x4014261c:0x3ffcacc0 0x400830da:0x3ffcace0 0x4015bbb1:0x3ffcad00

    --- BEGIN CORE DUMP ---
    {"arch": "ESP32", "cause":29,
    "REGS": {"addr": 1073506656, "data": "
    f3QIQJJxCIBwq/w/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAAFCr/D8AAAAAAav8P6+q/D8wAAAAAAAAALmq/D/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e
    776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e


    I got core dump and backtraced.

    (gdb) bt
    #0 0x4008747f in invoke_abort ()
    at /opt/Espressif/esp-idf/components/esp32/./panic.c:139
    #1 0x40087192 in abort ()
    at /opt/Espressif/esp-idf/components/esp32/./panic.c:148
    #2 0x40109e62 in __assert_func (
    file=0x3f40a1aa "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c",
    line=881, func=,
    failedexpr=0x3f40a38f "p->tot_len == p->len")
    at ../../../.././newlib/libc/stdlib/assert.c:63
    #3 0x40103f72 in pbuf_dechain (p=)
    at /opt/Espressif/esp-idf/components/lwip/core/pbuf.c:881
    #4 0x40135f71 in mg_lwip_handle_recv_tcp (nc=0x3ffb475c)
    at common/platforms/lwip/mg_lwip_net_if.c:201
    #5 mg_ev_mgr_lwip_process_signals (mgr=)
    at common/platforms/lwip/mg_lwip_ev_mgr.c:75
    #6 0x401384f8 in mg_lwip_if_poll (iface=, timeout_ms=0)
    at common/platforms/lwip/mg_lwip_ev_mgr.c:125
    #7 0x4013907d in mg_mgr_poll (m=0x3ffc7d28 , timeout_ms=0)
    at mongoose/src/net.c:259
    #8 0x4014261f in mongoose_poll (ms=0)
    at /mongoose-os/fw/src/mgos_mongoose.c:59
    #9 0x400830dd in mgos_mg_poll_cb (arg=)
    at /mongoose-os/fw/platforms/esp32/src/esp32_main.c:183
    #10 0x4015bbb4 in mgos_task (arg=)
    at /mongoose-os/fw/platforms/esp32/src/esp32_main.c:254

    Could you tell me how to solve this problem please ?
    Thank you.

  • MrZANEMrZANE Sweden

    Hi again.
    Where you able to use my files to create a backtrace?
    Did it reveal anything?
    Anything I can do to fix/work around the issue, if not idea if/when this can be fixed?
    TIA
    Jimmy

  • seisei Japan

    I tried mongoose-os/fw/examples/c_http on ESP32 and got the same error when posting large data to rpc/OTA.Update.

    (1)build c_http example and flash it.
    (2)run the python3 code below to post request repeatedly.
    (3)wait about 30sec.
    (4)got the same error. Backtraced, I found the same result.

    import requests
    import json
    import time
    
    for rep in range(10000):
        start = time.time()
        try:
            r = requests.post("http://192.168.1.80/rpc/OTA.Update", data=json.dumps({"cmds": "A" * 3000}), timeout=10)
        except requests.exceptions.Timeout:
            print("Timeout")
            break;
        else:
            print(r.status_code, r.text)
            print(r.headers)
            print("%d : %.3fsec"%(rep, time.time() - start))
    

    Jimmy,
    I tried to backtrace your coredump, but failed because lack of files.
    All files in the build/objs/.bin, /build/objs/.elf are required to backtrace.

  • seisei Japan

    I found the reason why it stops.

    For ESP32, "RTOS_SDK" is not defined by Makefile.
    Therefore, in common/platforms/lwip/mg_lwip_net_if.c Line70,
    mgos_lock(), mgos_unlock() semaphore functions does not become valid.

    "#ifdef RTOS_SDK" removed, the problem does not appear.

    I think Makefile for ESP32 should define RTOS_SDK and hope it to be fixed in next release.

  • huskyhusky Salamanca

    Hi, I have the same problem when ota starts. Anything I can do to fix it?

    [Aug 20 12:24:00.221] mgos_upd_file_begin Writing app image @ 0x10000
    [Aug 20 12:24:02.313] mongoose_poll New heap free LWM: 99712
    [Aug 20 12:24:07.673] assertion "p->tot_len == p->len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 881, function: pbuf_dechain
    [Aug 20 12:24:07.687] abort() was called at PC 0x40148847 on core 0
    [Aug 20 12:24:07.692]
    [Aug 20 12:24:07.692] Backtrace: 0x40088783:0x3ffd1370 0x40088493:0x3ffd1390 0x40148847:0x3ffd13b0 0x40142c03:0x3ffd13e0 0x4016b216:0x3ffd1400 0x4016d799:0x3ffd1430 0x4016e31e:0x3ffd14a0 0x40179ea4:0x3ffd14c0 0x400830da:0x3ffd14e0 0x401948a5:0x3ffd1500

  • rojerrojer Dublin, Ireland

    @sei a very good point! ESP32 has preemptive RTOS and should definitely use locking in the LWIP adapter.

  • MrZANEMrZANE Sweden

    That's great news, I'm looking forward to testing it.
    A big thanks to all involved.
    Any idea when this will be in effect on the online build server?

  • rojerrojer Dublin, Ireland

    it will first be in the next -latest build. there is no schedule for them, but we usually push one every few days. e.g. we did a push today (but this change didn't make it into the build).

  • huskyhusky Salamanca

    I have tried it and it works perfect, thank you very much.

  • MrZANEMrZANE Sweden

    Is it possible to choose branch the build against with the mos tool?
    Like fex. when you say use the -latest build, or are you suppose to get the latest branch from the GIT repo and build locally?

  • rojerrojer Dublin, Ireland

    if you're using mos-latest, you will actually pull newest code from repos (master branch) all the time (unless you changed your local copy, in which case it's not touched).

  • huskyhusky Salamanca

    Hello,
    Sometimes the ota fail with core dump

    Thanks in advance

    Carlos

    [Aug 24 12:49:06.883] updater_context_crea Starting update (timeout 300)
    [Aug 24 12:49:06.891] mgos_ota_http_start Update URL: https://s3-eu-west-1.amazonaws.com/*****/fw-1.4.zip, ct: 300, isv? 0
    [Aug 24 12:49:07.366] SW ECDH
    [Aug 24 12:49:08.541] mongoose_poll New heap free LWM: 96624
    [Aug 24 12:49:08.551] mongoose_poll New heap free LWM: 96512
    [Aug 24 12:49:08.558] mongoose_poll New heap free LWM: 96392
    [Aug 24 12:49:08.663] mongoose_poll New heap free LWM: 95008
    [Aug 24 12:49:08.751] parse_manifest FW: Wip esp32 1.4 20170823-155921/develop@4d723b0d+ -> 1.4 20170823-155921/develop@4d723b0d+
    [Aug 24 12:49:08.768] mgos_upd_begin App: Wip.bin -> app_1, FS fs.img -> fs_1
    [Aug 24 12:49:09.191] mgos_upd_file_begin Writing app image @ 0x1d0000
    [Aug 24 12:49:10.289] mongoose_poll New heap free LWM: 85360
    [Aug 24 12:49:16.821] I (574746) wifi: active cnt: 320
    [Aug 24 12:49:20.096] assertion "p->tot_len == p->len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 881, function: pbuf_dechain
    [Aug 24 12:49:20.110] abort() was called at PC 0x4014920b on core 0
    [Aug 24 12:49:20.113]
    [Aug 24 12:49:20.113] Backtrace: 0x4008859b 0x400882b7 0x4014920b 0x401435c3 0x40171cee 0x4018c8ff 0x40191068 0x4019077b 0x4019065e 0x401711a3 0x4016e237 0x4016b925 0x4016dda5 0x4016e942 0x4017a4b8 0x40082efe 0x40195da9
    [Aug 24 12:49:20.129] --- BEGIN CORE DUMP ---
    [Aug 24 12:49:20.132] {"arch": "ESP32", "cause":29,
    [Aug 24 12:49:20.135] "REGS": {"addr": 1073511048, "data": "
    [Aug 24 12:49:20.137] m4UIQLqCCIDgEv0/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAAMAS/T8AAAAAcRL9Px8S/T8wAAAAAAAAACkS/T/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd74whlAF8MZQAAAAAAFAAAAAAAAAAEAAADvvq3e776t3iAIBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"},

  • huskyhusky Salamanca

    The same file, with the name fw.zip works but with the name fw-1.4.zip causes a core dump. Does it make any sense?

  • huskyhusky Salamanca

    No, with the name fw.zip also fails. :(
    Sometimes it works and sometimes causes error, with the same file and the same version

  • edited August 26

    I am getting something similar trying to build with mjs. After I flashed on the new build (latest), I get the following error:

    Aug 26 16:13:34.684] Tasks currently running:
    [Aug 26 16:13:34.686] CPU 0: mgos
    [Aug 26 16:13:34.687] Aborting.
    [Aug 26 16:13:34.688] abort() was called at PC 0x400d28f4 on core 0
    [Aug 26 16:13:34.692]
    [Aug 26 16:13:34.692] Backtrace: 0x40087407 0x40087123 0x400d28f4 0x400817d1
    [Aug 26 16:13:34.697] --- BEGIN CORE DUMP ---
    [Aug 26 16:13:34.699] {"arch": "ESP32", "cause":29,
    [Aug 26 16:13:34.702] "REGS": {"addr": 1073506464, "data": "
    [Aug 26 16:13:34.705] B3QIQCZxCICwBfw/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAAJAF/D8AAAAAQQX8P+8E/D8wAAAAAAAAAPkE/D/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e

  • SergeySergey Dublin, Ireland
    edited September 4
  • frscfrsc Germany
    edited September 7

    @Sergey:
    I seem to have the same problem. During OTA, the system crashes and my backtrace looks similar to the one of @sei above.
    This happens with an empty mjs_base application copied from the examples directory.
    When I use mos flash mos-esp32 to flash a basic application, the OTA works as expected afterwards.

    Is the fix from @rojer not applied yet, or is there an other problem remaining?
    Thanks for looking into this!

    (gdb) backtrace 
    #0  0x40086ee3 in invoke_abort () at /opt/Espressif/esp-idf/components/esp32/./panic.c:139
    #1  0x40086be6 in abort () at /opt/Espressif/esp-idf/components/esp32/./panic.c:148
    #2  0x4010b062 in __assert_func (file=0x3f40935d "ressif/esp-idf/components/lwip/core/pbuf.c", line=864, func=<optimized out>, failedexpr=0x3f409520 "en == p->len + q->tot_len")
        at ../../../.././newlib/libc/stdlib/assert.c:63
    #3  0x4010544a in pbuf_dechain (p=<optimized out>) at /opt/Espressif/esp-idf/components/lwip/core/pbuf.c:881
    #4  0x4012d76d in mg_lwip_handle_recv_tcp (nc=0x3ffb50fc) at common/platforms/lwip/mg_lwip_net_if.c:201
    #5  mg_ev_mgr_lwip_process_signals (mgr=<optimized out>) at common/platforms/lwip/mg_lwip_ev_mgr.c:75
    #6  0x4012fb9c in mg_lwip_if_poll (iface=<optimized out>, timeout_ms=0) at common/platforms/lwip/mg_lwip_ev_mgr.c:125
    #7  0x4013072d in mg_mgr_poll (m=0x3ffc7ed4 <s_mgr>, timeout_ms=0) at mongoose/src/net.c:259
    #8  0x4013cd2b in mongoose_poll (ms=0) at /mongoose-os/fw/src/mgos_mongoose.c:59
    #9  0x40083101 in mgos_mg_poll_cb (arg=<optimized out>) at /mongoose-os/fw/platforms/esp32/src/esp32_main.c:183
    #10 0x40158c3c in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    
    
  • rojerrojer Dublin, Ireland
    edited September 14

    gentlemen, thank you for your patience. i believe this and this should take care of the problem.

Sign In or Register to comment.