Hi.
I'm trying to get OTA to work, initiated from AWS with data on local server (HTTP for now), but the update fails with a core dump.
The only difference from the code already running is that the period of which the led is blinking.
I've attached the fw.zip and a part of the core dump below
Thanks in advance
Jimmy
[Aug 13 10:17:03.805] updater_context_crea Starting update (timeout 300)
[Aug 13 10:17:03.815] mgos_ota_http_start Update URL: http://192.168.0.217:8080/fw.zip, ct: 60, isv? 0
[Aug 13 10:17:03.861] parse_manifest FW: AWS-test-1.13 esp32 1.0 20170813-080940/??? -> 1.0 20170813-080940/???
[Aug 13 10:17:03.872] mgos_upd_begin App: AWS-test-1.13.bin -> app_1, FS fs.img -> fs_1
[Aug 13 10:17:04.180] mgos_upd_file_begin Writing app image @ 0x1d0000
[Aug 13 10:17:06.558] assertion "p->tot_len == p->len + q->tot_len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 864, function: pbuf_dechain
[Aug 13 10:17:06.604] abort() was called at PC 0x4010b327 on core 0
[Aug 13 10:17:06.604]
[Aug 13 10:17:06.604] Backtrace: 0x400875b3:0x3ffd20f0 0x400872c3:0x3ffd2110 0x4010b327:0x3ffd2130 0x4010544f:0x3ffd2160 0x40137792:0x3ffd2180 0x40139771:0x3ffd21b0 0x400830e3:0x3ffd21e0 0x4015f635:0x3ffd2200
[Aug 13 10:17:06.605]
[Aug 13 10:17:06.605] --- BEGIN CORE DUMP ---
[Aug 13 10:17:06.605] {"arch": "ESP32", "cause":29,
[Aug 13 10:17:06.605] "REGS": {"addr": 1073539172, "data": "
[Aug 13 10:17:06.605] s3UIQMZyCIDwIP0/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAANAg/T8AAAAAgSD9Py8g/T8wAAAAAAAAADkg/T/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd5AThZAX04WQAAAAAAFAAAAAAAAAAEAAADvvq3e776t3iAFBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"},
Comments
Could you provide a backtrace please - https://mongoose-os.com/docs/overview/debug.html#analysing-core-dumps ?
I currently don't have a linux/MAC system so I don't think I can do make without cygwin or similar but I've attached the relevant files in hope that someone at Cesanta maybe could have a look at it.
(Had to name the fw.zip fw-new.zip since the forum system didn't allow me to upload it else)
Let me know if there is anything else you need.
Hi,
I met the same error when posting repeatedly large (1kB) data to my ESP32 server.
mgos_http_ev 0x3ffb475c HTTP connection from 192.168.1.77:58569
assertion "p->tot_len == p->len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 881, function: pbuf_dechain
abort() was called at PC 0x40109e5f on core 0
Backtrace: 0x4008747f:0x3ffcab70 0x4008718f:0x3ffcab90 0x40109e5f:0x3ffcabb0 0x40103f6f:0x3ffcabe0 0x40135f6e:0x3ffcac00 0x401384f5:0x3ffcac30 0x4013907a:0x3ffcaca0 0x4014261c:0x3ffcacc0 0x400830da:0x3ffcace0 0x4015bbb1:0x3ffcad00
--- BEGIN CORE DUMP ---
{"arch": "ESP32", "cause":29,
"REGS": {"addr": 1073506656, "data": "
f3QIQJJxCIBwq/w/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAAFCr/D8AAAAAAav8P6+q/D8wAAAAAAAAALmq/D/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e
776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e
I got core dump and backtraced.
(gdb) bt
#0 0x4008747f in invoke_abort ()
at /opt/Espressif/esp-idf/components/esp32/./panic.c:139
#1 0x40087192 in abort ()
at /opt/Espressif/esp-idf/components/esp32/./panic.c:148
#2 0x40109e62 in __assert_func (
file=0x3f40a1aa "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c",
line=881, func=,
failedexpr=0x3f40a38f "p->tot_len == p->len")
at ../../../.././newlib/libc/stdlib/assert.c:63
#3 0x40103f72 in pbuf_dechain (p=)
at /opt/Espressif/esp-idf/components/lwip/core/pbuf.c:881
#4 0x40135f71 in mg_lwip_handle_recv_tcp (nc=0x3ffb475c)
at common/platforms/lwip/mg_lwip_net_if.c:201
#5 mg_ev_mgr_lwip_process_signals (mgr=)
at common/platforms/lwip/mg_lwip_ev_mgr.c:75
#6 0x401384f8 in mg_lwip_if_poll (iface=, timeout_ms=0)
at common/platforms/lwip/mg_lwip_ev_mgr.c:125
#7 0x4013907d in mg_mgr_poll (m=0x3ffc7d28 , timeout_ms=0)
at mongoose/src/net.c:259
#8 0x4014261f in mongoose_poll (ms=0)
at /mongoose-os/fw/src/mgos_mongoose.c:59
#9 0x400830dd in mgos_mg_poll_cb (arg=)
at /mongoose-os/fw/platforms/esp32/src/esp32_main.c:183
#10 0x4015bbb4 in mgos_task (arg=)
at /mongoose-os/fw/platforms/esp32/src/esp32_main.c:254
Could you tell me how to solve this problem please ?
Thank you.
Hi again.
Where you able to use my files to create a backtrace?
Did it reveal anything?
Anything I can do to fix/work around the issue, if not idea if/when this can be fixed?
TIA
Jimmy
I tried mongoose-os/fw/examples/c_http on ESP32 and got the same error when posting large data to rpc/OTA.Update.
(1)build c_http example and flash it.
(2)run the python3 code below to post request repeatedly.
(3)wait about 30sec.
(4)got the same error. Backtraced, I found the same result.
Jimmy,
I tried to backtrace your coredump, but failed because lack of files.
All files in the build/objs/.bin, /build/objs/.elf are required to backtrace.
I found the reason why it stops.
For ESP32, "RTOS_SDK" is not defined by Makefile.
Therefore, in common/platforms/lwip/mg_lwip_net_if.c Line70,
mgos_lock(), mgos_unlock() semaphore functions does not become valid.
"#ifdef RTOS_SDK" removed, the problem does not appear.
I think Makefile for ESP32 should define RTOS_SDK and hope it to be fixed in next release.
Hi, I have the same problem when ota starts. Anything I can do to fix it?
[Aug 20 12:24:00.221] mgos_upd_file_begin Writing app image @ 0x10000
[Aug 20 12:24:02.313] mongoose_poll New heap free LWM: 99712
[Aug 20 12:24:07.673] assertion "p->tot_len == p->len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 881, function: pbuf_dechain
[Aug 20 12:24:07.687] abort() was called at PC 0x40148847 on core 0
[Aug 20 12:24:07.692]
[Aug 20 12:24:07.692] Backtrace: 0x40088783:0x3ffd1370 0x40088493:0x3ffd1390 0x40148847:0x3ffd13b0 0x40142c03:0x3ffd13e0 0x4016b216:0x3ffd1400 0x4016d799:0x3ffd1430 0x4016e31e:0x3ffd14a0 0x40179ea4:0x3ffd14c0 0x400830da:0x3ffd14e0 0x401948a5:0x3ffd1500
@sei a very good point! ESP32 has preemptive RTOS and should definitely use locking in the LWIP adapter.
fixed.
That's great news, I'm looking forward to testing it.
A big thanks to all involved.
Any idea when this will be in effect on the online build server?
it will first be in the next
-latest
build. there is no schedule for them, but we usually push one every few days. e.g. we did a push today (but this change didn't make it into the build).I have tried it and it works perfect, thank you very much.
Is it possible to choose branch the build against with the mos tool?
Like fex. when you say use the -latest build, or are you suppose to get the latest branch from the GIT repo and build locally?
if you're using mos-latest, you will actually pull newest code from repos (master branch) all the time (unless you changed your local copy, in which case it's not touched).
Hello,
Sometimes the ota fail with core dump
Thanks in advance
Carlos
[Aug 24 12:49:06.883] updater_context_crea Starting update (timeout 300)
[Aug 24 12:49:06.891] mgos_ota_http_start Update URL: https://s3-eu-west-1.amazonaws.com/*****/fw-1.4.zip, ct: 300, isv? 0
[Aug 24 12:49:07.366] SW ECDH
[Aug 24 12:49:08.541] mongoose_poll New heap free LWM: 96624
[Aug 24 12:49:08.551] mongoose_poll New heap free LWM: 96512
[Aug 24 12:49:08.558] mongoose_poll New heap free LWM: 96392
[Aug 24 12:49:08.663] mongoose_poll New heap free LWM: 95008
[Aug 24 12:49:08.751] parse_manifest FW: Wip esp32 1.4 20170823-155921/develop@4d723b0d+ -> 1.4 20170823-155921/develop@4d723b0d+
[Aug 24 12:49:08.768] mgos_upd_begin App: Wip.bin -> app_1, FS fs.img -> fs_1
[Aug 24 12:49:09.191] mgos_upd_file_begin Writing app image @ 0x1d0000
[Aug 24 12:49:10.289] mongoose_poll New heap free LWM: 85360
[Aug 24 12:49:16.821] I (574746) wifi: active cnt: 320
[Aug 24 12:49:20.096] assertion "p->tot_len == p->len" failed: file "/opt/Espressif/esp-idf/components/lwip/core/pbuf.c", line 881, function: pbuf_dechain
[Aug 24 12:49:20.110] abort() was called at PC 0x4014920b on core 0
[Aug 24 12:49:20.113]
[Aug 24 12:49:20.113] Backtrace: 0x4008859b 0x400882b7 0x4014920b 0x401435c3 0x40171cee 0x4018c8ff 0x40191068 0x4019077b 0x4019065e 0x401711a3 0x4016e237 0x4016b925 0x4016dda5 0x4016e942 0x4017a4b8 0x40082efe 0x40195da9
[Aug 24 12:49:20.129] --- BEGIN CORE DUMP ---
[Aug 24 12:49:20.132] {"arch": "ESP32", "cause":29,
[Aug 24 12:49:20.135] "REGS": {"addr": 1073511048, "data": "
[Aug 24 12:49:20.137] m4UIQLqCCIDgEv0/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAAMAS/T8AAAAAcRL9Px8S/T8wAAAAAAAAACkS/T/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e776t3u++rd74whlAF8MZQAAAAAAFAAAAAAAAAAEAAADvvq3e776t3iAIBgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"},
The same file, with the name fw.zip works but with the name fw-1.4.zip causes a core dump. Does it make any sense?
No, with the name fw.zip also fails.
Sometimes it works and sometimes causes error, with the same file and the same version
I am getting something similar trying to build with mjs. After I flashed on the new build (latest), I get the following error:
Aug 26 16:13:34.684] Tasks currently running:
[Aug 26 16:13:34.686] CPU 0: mgos
[Aug 26 16:13:34.687] Aborting.
[Aug 26 16:13:34.688] abort() was called at PC 0x400d28f4 on core 0
[Aug 26 16:13:34.692]
[Aug 26 16:13:34.692] Backtrace: 0x40087407 0x40087123 0x400d28f4 0x400817d1
[Aug 26 16:13:34.697] --- BEGIN CORE DUMP ---
[Aug 26 16:13:34.699] {"arch": "ESP32", "cause":29,
[Aug 26 16:13:34.702] "REGS": {"addr": 1073506464, "data": "
[Aug 26 16:13:34.705] B3QIQCZxCICwBfw/LwAAAC8AAAAMAAAA/////wAAAAD+////AAAAAJAF/D8AAAAAQQX8P+8E/D8wAAAAAAAAAPkE/D/vvq3e776t3u++rd7vvq3e776t3u++rd7vvq3e
Get and publish the backtrace please!
Instructions are at https://mongoose-os.com/docs/overview/debug.html#analysing-core-dumps
@Sergey:
I seem to have the same problem. During OTA, the system crashes and my backtrace looks similar to the one of @sei above.
This happens with an empty mjs_base application copied from the examples directory.
When I use
mos flash mos-esp32
to flash a basic application, the OTA works as expected afterwards.Is the fix from @rojer not applied yet, or is there an other problem remaining?
Thanks for looking into this!
gentlemen, thank you for your patience. i believe this and this should take care of the problem.