Copyright © https://mongoose-os.com

Mongoose OS Forum

frame

Exception issues in mbuf_insert

Using Mongoose 6.11
I have a system that is raising an alarm every 30 seconds upto a maximum of 750 alarms, from a UDP connection.
This causes triggers to generate data, which we need to send to websocket clients

We are multicasting this data, every time a new alarm arrives, using this for loop

for (c = mg_next(mg_AllConns.mgr, NULL); c != NULL; c = mg_next(mg_AllConns.mgr, c))
{
mg_send_websocket_frame(c, WEBSOCKET_OP_TEXT, strResponse.c_str(), strResponse.size());
}

the size of the data will build to a maximum of ~750kB

At random points we will either get an exception in the
if ((p = (char *) MBUF_REALLOC(a->buf, new_size)) != NULL)

or in the
memcpy(a->buf + off, buf, len); - changed this to memmove(a->buf + off, buf, len);
in case we had memory overlap, but still had the exception.

One theory we had was that the send doesn't happen instantly, and the REALLOC, or the memcpy/memmove, was corrupting data before it was sent.

we can't wait for the mg_send_websocket_frame to return, as it's a void function.

Is there something I may be missing that is causing my issues?

Thanks

Chris

Comments

  • SergeySergey Dublin, Ireland

    are you sending that data from a thread that is separate from the thread that does mg_mgr_poll() ?

  • SamuraibikerSamuraibiker Basingstoke

    Hi Sergey
    Yes it is, because the data isn't in response to anything from the clients.

  • SamuraibikerSamuraibiker Basingstoke

    Looks like this - Alarm Sequence.docx

  • SamuraibikerSamuraibiker Basingstoke
    edited June 11

    In the following code section from mbuf_insert

    else 
      {
        size_t new_size = (size_t)((a->len + len) * MBUF_SIZE_MULTIPLIER);
        if ((p = (char *) MBUF_REALLOC(a->buf, new_size)) != NULL) 
        {
          a->buf = p;
          memmove(a->buf + off + len, a->buf + off, a->len - off);  
    
          if (buf != NULL) 
          {
              // memcpy(a->buf + off, buf, len);
              memmove(a->buf + off, buf, len);
          }
    
          a->len += len;
          a->size = new_size;
        } 
        else 
        {
          len = 0;
        }
      }
    

    what is the first memmove doing? apart from moving x amount of data from one area of a->buf to another
    At the point we get the exception a->len is 0 and off is 10.

    Exception arises on the memmove in the if (buf != NULL) section.
    code runs for quite a while before the exception occurs
    11hrs
    21hrs
    23hrs
    30hrs

    Any pointers gratefully accepted.

    Thanks

  • nliviunliviu Romania

    The only reason I can see is that strResponse.c_str() becomes invalid
    The first memmove makes room for the new data to be inserted.

  • SamuraibikerSamuraibiker Basingstoke

    Thanks for the quick response.

    I thought the MBUF_REALLOC(a->buf, new_size) made/reallocated the space for the new data.

    Are these different things?
    Am I misunderstanding something somewhere?

  • nliviunliviu Romania

    mbuf_insert(struct mbuf *a, size_t off, const void *buf, size_t len): the new data buf of length len is inserted at offset off in the buffer. The data existing between offset off must be moved to off+len.
    MBUF_REALLOC reallocates the buffer to be able to hold a->len+len bytes.

  • SamuraibikerSamuraibiker Basingstoke

    Thank you for your responses.

    I'll have a look at the possibility of the invalid data causing problems.

  • nliviunliviu Romania

    Make a local copy of the data outside of the for loop.

    Thanked by 1Samuraibiker
  • SamuraibikerSamuraibiker Basingstoke

    Unfortunately this doesn't seem to have helped
    Remote Debugging gives me this

    if ((p = (char *) MBUF_REALLOC(a->buf, new_size)) != NULL) -> p has a value of 0x034c7020
    {

      a->buf = p;  ->  a->buf shows as 0x00000000 <Bad Ptr>
    
      if ((a->len - off) < 0)
        off = 0;
    
      memmove(a->buf + off + len, a->buf + off, a->len - off);  -> No problems shown here from stack trace
    
      if (buf != NULL) 
      {
          // memcpy(a->buf + off, buf, len);
          memmove(a->buf + off, buf, len);   ->  Stack trace from here to memcpy shows the problem, ie where the debugger stops on breakpoint
      }
    
      a->len += len;
      a->size = new_size;
    } 
    

    So, is my problem stemming from the MBUF_REALLOC, or is it just within the second memmove in the if buf != NULL?

    buf and len are consistent throughout the stack trace from where I call mg_send_websocket_frame to here

    buf is 0x02d9b930
    len is 352472

    This was the 28th message where the length was 352472

    the previous crash we had 182 messages sent of length 352472, before everything fell over.

    Am I doing something stupid here, or just being unlucky??

  • SergeySergey Dublin, Ireland

    just stop trying to fix that.

    you're wrong in the very beginning: you MUST NOT call any mongoose functions from the other threads, except one: mg_broadcast.

  • SamuraibikerSamuraibiker Basingstoke
    edited June 18

    Hi Sergey

    Ok, so I've changed my code to use the mg_broadcast

    I have

    void bcast_ev_handler(struct mg_connection *nc, int ev, void *ev_data)
    {
        struct mg_connection *c;
    
        string *pBroadcastData = (string *) ev_data;
    
        for (c = mg_next(nc->mgr, NULL); c != NULL; c = mg_next(nc->mgr, c))
        {
            if (c->flags & MG_F_IS_WEBSOCKET)
            {
                mg_printf(c, "%s", pBroadcastData->c_str());
            }
        }
    }
    
    
    void BroadcastAlarmsData(string strResponse)
    {
        static string strBroadcastData(strResponse);
    
        string strMsg;
    
        try
        {       
            mg_broadcast(mg_AllConns.mgr, 
                     bcast_ev_handler, 
                     (void *) &strBroadcastData, 
                     strBroadcastData.size());
    
    
        }
        catch (...)
        {
            strMsg.clear();
            strMsg = "Failed to Broadcast Alarms Data ";
            strMsg += strResponse;
            strMsg += " to UI. Log Level: LOG_INFO";
            pLogger->info(strMsg);
        }
    }
    

    However, this is having an adverse affect on my socket

    Browser is Chrome, and from the dev tools (F12), when we get to the mg_printf(....)

    I get an Opcode -1, the socket breaks then reconnects.

    I guess my implementation is a little flawed.

    Could you pick it apart, and let me know where I've got it wrong?

    Thank You

  • SamuraibikerSamuraibiker Basingstoke

    I got my inspiration from
    https://github.com/cesanta/mongoose/blob/dev/examples/multithreaded/multithreaded.c

    I haven't used the socketpair, as that seems to get defined in response to a request on the listening event handler (HTTP in the case of the link )

    My data isn't generated from a listener request, should I still be using them?
    So I don't have a client request/server response on the websocket

    Can the socketpair be configured from the Websocket handshake, or does it have to be at the point of the HTTP request?

    Sorry if this is a bit vague. Websockets and client\server interactions are still a bit new to me.

    Thank you

  • SamuraibikerSamuraibiker Basingstoke

    I now have the mg_broadcast running.
    In mg_broadcast, memcpy is giving exceptions.
    Changed to memmove, and still getting exceptions, on 'larger' volumes of data

    10 consecutive runs, the data len is\was 4215, 4215, 3784, 4240, 4240, 4697, 3784, 5153, 3784, 5153.

    All are within your default MG_CTL_MSG_MESSAGE_SIZE of 8192

    Data I am trying to put on the socket is shown in the attached file AlarmData.txt - a JSON string
    there could be a maximum of 750 of the entries in the AlarmsArray, can MG_CTL_MSG_MESSAGE_SIZE be adjusted to handle all this potential data? - needs about 350k of data

    This data isn't generated after a request from the client, ie the ev_handler, but from a different process, which then needs sending to the client.

    Are we trying to use the websockets in a way they are not designed to be used?

    I have declared
    static sock_t sock[2];

    configured
    mg_socketpair(sock, SOCK_STREAM)
    prior to mg_mgr_init and mg_bind.

    However whenever i try to write to the socket with
    write(sock[0], user_data, len)
    I get a fatal error - undefined file handle.
    Am I using the correct write?

    Also in your case MG_EV_HTTP_REQUEST in your ev_handler
    you declare struct work_request req = {(unsigned long)nc->user_data};

    I get a compilation error for this.
    Moving the declaration outside the switch statement obviously solves this problem

  • SergeySergey Dublin, Ireland
            mg_broadcast(mg_AllConns.mgr, 
                     bcast_ev_handler, 
                     (void *) &strBroadcastData, 
                     strBroadcastData.size());
    

    This looks like that you're sending a pointer to a string, not a string itself.
    There is no need to use socket pair.
    Just make sure you copy the data you'd like to send, "by value", so to say. Meaning, if you're sending a string, send a string, not a pointer to it.

    Then, when you're broadcasting, make sure to send only to the relevant connections - i can't remember for example if a listening connection is also marked as websocket - check it out.

  • SamuraibikerSamuraibiker Basingstoke

    Hi Sergey
    Not convinced this is what you meant
    mg_broadcast(mg_AllConns.mgr, bcast_NumAlarms_handler, &strNumAlarmsData, strNumAlarmsData.size());

    If not, how do I just pass a string (pass by value) when mg_broadcast expects a pointer?

    aside from the above.
    mg_broadcast is only 'called' once, yet my event handler is 'triggered' 8 times for the one call.

    In mg_broadcast the MG_SEND_FUNC and MG_RECV_FUNC are both using mgr->ctl[0].

    In my event handler I'm sending out on c->sock, should I be using c->mgr->ctl[0/1]?

    Thanks

    Chris

  • SamuraibikerSamuraibiker Basingstoke

    I think...
    Between us we've just cracked this.

    Thanks for all your help everyone, much appreciated

    Chris

  • SamuraibikerSamuraibiker Basingstoke

    Spoke to soon...

    My data appears to be getting corrupted between the call to mg_broadcast and the event handler firing.

  • SamuraibikerSamuraibiker Basingstoke

    Ignore the post above - user error

Sign In or Register to comment.