Why should I not use dynamic memory allocation in embedded systems?
I keep hearing that dynamic memory allocation on the so-called "heap" segment should not be used in microcontroller-based embedded systems. Pretty much every safety standard like for example MISRA C bans the use of it. And I don't only hear this in the context of safety-related applications, but it supposedly applies to every single microcontroller application out there.
Why not? I realize that there are non-deterministic run-time aspects of heap allocation, but the same goes for the stack. Why is stack allocation fine but heap allocation isn't? In desktop/system programming it is even recommended practice to allocate large objects on the heap before the stack, since the heap is persistent through the execution and cannot cause stack overflows.
What are the specific problems that dynamic memory allocation can cause?
There are numerous problems with dynamic heap allocation and the mentioned safety standards like MISRA C just scrape the top of the iceberg when they try to give a rationale against it. These problems exist:
- Non-deterministic allocation at run-time.
- Memory might not be available.
- Waste of space.
- Heap memory fragmentation.
- Steep execution overhead.
- User data memory fragmentation.
- Unspecified amount of memory allocated.
- "Saving memory" and freeing doesn't make sense.
- Memory leaks.
- Dangling pointers.
- Indeterminate values.
- Indeterminate types.
- Bad function interfaces of
malloc
,realloc
andfree
.
I will address each problem listed below.
Non-deterministic allocation at run-time
Microcontroller embedded systems have limited memory and often hard real-time requirements. They therefore always need to specify a maximum amount of something in advance. This fixed amount of maximum memory needs to be exactly as large as required to handle the worst-case scenario. Not more, not less. This amount can be determined at compile-time. In fact it can/should be determined as early on as during project specification and design, before a single line of code is even written.
To have some undefined amount of memory expand and collapse in unknown ways during run-time is of course dangerous - we simply can't have the program doing things beyond the programmer's control or beyond what the programmer predicted.
This is also at some extent true for stack allocation, but maximum stack depth is somewhat easier to predict. Some tools can calculate stack use for the maximum call stack depth (though interrupts means extra stack use). It is true that one should not allocate large objects on the stack to risk stack overflow. But in embedded systems, the heap is not the way to go, for all the reasons listed in this post. Instead, large objects like for example communication buffers should be allocated in the .bss
/.data
segments. Or in case they are read-only, in flash .rodata
/.text
segments.
Memory might not be available
In case dynamic allocation fails, the program is pretty much done for. It is unlikely meaningful to continue execution when that happens, most often you just have to close down the program as quickly and gracefully as you can. Except "closing down" a program isn't really an option in embedded systems - forcing a MCU reset is the only sensible thing you can do. Though if the heap ran out of memory once, it will do so again, given the necessary circumstances. If we ever run out of memory, it can only mean that something in the system design is fundamentally wrong.
It's also a commonly occurring problem that the programmer forgets to check if malloc
returned NULL
. In case it does and we didn't check for it, we will get a dangerous null pointer access bug.
Waste of space
One very important thing to realize is that the .heap
segment in your microcontroller is a fixed size segment. It's going to sit in your RAM taking up a fixed amount of memory no matter if we use 100% of that memory or 0% of it. If you free memory, it doesn't mean that this segment dynamically shrinks, it remains the same. To be pedantic, we don't actually get dynamic allocation, because the heap space is already allocated and reserved at link-time.
There is also various overhead memory use in the .heap
segment, which I will address below. For now, realize that 1024 bytes of .heap
will always store less actual data than 1024 bytes of .bss
/.data
, which will store exactly 1024 bytes with zero overhead and 100% memory utilization.
Heap memory fragmentation
If we allocate and deallocate many times, then over time the heap will get fragmented. It can end up with small chunks of memory here and there which are too small to store the (aligned) data requested, so such segments end up as dead waste of space. This is known as heap fragmentation. Overall we end up using more memory than actually required.
Steep execution overhead
The dynamic allocation functions have a pretty steep execution overhead during allocation. They need to keep track of where there are free parts of the heap available. A heap (not to be confused with the abstract data type "heap") can be implemented in several different ways. Perhaps through some manner of look-up table which gives faster look-up but consumes additional memory. Or perhaps with some iterative search looking through heap segment headers, which is very slow. Not only does the allocation function need to find available memory, it needs to find enough available memory to allocate the requested chunk.
This allocation overhead time doesn't even exist for static storage duration objects.
User data memory fragmentation
When repeatedly calling malloc
, the user data will not necessarily end up contiguously - there are absolutely no guarantees for that and in many cases it isn't possible, depending on heap fragmentation and the requested size for the new memory. Additionally, each allocation very likely comes with overhead in the form of a segment header and padding bytes.
The problem with having user data fragmented all over the place is mainly an issue on mid-range MCUs like for example >Cortex M4 or PowerPC e200. Such MCUs utilize branch prediction at some extent and may use both data and/or instruction cache. The MCU cannot effectively load our data into data cache lines if it is fragmented all over the place.
Unspecified amount of memory allocated
As already mentioned, each call to malloc
may result in a segment header and an unspecified amount of padding bytes at the end. C requires the returned pointer from malloc
& friends to be suitably aligned, so padding at the end is very likely. Also, the heap allocation functions might need things like look-up tables internally. In practice, the application programmer has no clue how much memory that is effectively used during heap allocation.
"Saving memory" and freeing doesn't make sense
Calling free()
in a single core microcontroller application never makes any sense - there is nobody to share the memory with, our program has complete control of it all. As established earlier, we need to handle the worst case scenario so we need to allocate that exact amount. Freeing up memory when we are not executing the worst case scenario is senseless, because if there are parts of the code which would actually benefit from that extra available memory, that only means that those very same parts will either perform poorly or fail/crash during the worst case scenario, so that would be a design mistake.
And as noted above, repeatedly allocating/deallocating causes lots of fragmentation issues both with the user data as well as with the heap itself.
Memory leaks
Memory leaks is perhaps the most well-known problem with dynamic allocation. This happens when you somehow lose a reference to the allocated memory, by incorrectly overwriting a pointer etc. This common bug became such a huge problem in desktop programming that modern languages like Java even decided to implement "garbage collection", which is a dedicated thread tasked to deallocate memory at run-time.
Memory leaks is mostly a problem in case you repeatedly call a function which leaks, in which case the heap will soon run out of memory and the whole program will fail.
Dangling pointers
A dangling pointer is a pointer that used to point at valid dynamic memory. After that memory got freed, the pointer still points at the same address, but it's now marked as available for the heap, so something else might get stored there at any time. To avoid such bugs, the recommended practice is to set the pointer to NULL after calling free()
, but that will only help if you actually check if a pointer is a null pointer or not before accessing the contents. (As a side note, passing a null pointer to free()
is fine, it's a well-defined no-op.)
Indeterminate values
malloc
and realloc
leave freshly allocated memory with indeterminate values - it could contain anything. Failing to initialize the memory means we got a bug. calloc
is different since it guarantees to set all allocated memory to zero (and is therefore even slower).
Indeterminate types
More of an advanced topic: during compile-time the C language type system enforces the compiler to keep track of what type each known object in memory got. Normal variables and memory areas get assigned an effective type, a type which the compiler is allowed to assume being the only type residing in that memory area. Accessible/addressable memory is formally called lvalues and accessing such memory would be an lvalue access. The type system regulates the manner in which an lvalue access of an object can be done. If we try to lvalue access it with a pointer to a different type than the effective type, we invoke undefined behavior, a so called "strict pointer aliasing violation", meaning a bug that could lead to wrong machine code being generated.
The memory returned by malloc
& friends is special as it has no such effective type - it remains a blank sheet to the compiler until the point where we store something in that memory. Should we do this the first time using another type than the intended, we would later get undefined behavior if we attempt to access that memory through another type. Example:
uint8_t* ptr = malloc(n);
// ptr now points at a chunk of memory with no effective type
for(size_t i=0; i<n; i++)
{
// here we do a write access:
ptr[i] = 1; // the memory gets effective type uint8_t/unsigned char.
}
...
uint32_t* u32 = (uint32_t*)ptr; // this is a valid pointer conversion. However:
*u32 = something; // bug: lvalue access through the wrong effective type
Bugs like these are subtle and could cause very strange phenomenon particularly on old gcc versions. We don't really want strict pointer aliasing to be enabled when doing embedded systems programming, but a whole lot of programmers aren't even aware of it. It's easier to slip when using malloc - it isn't unreasonable to call malloc, then set up each byte as above to a known value, then access that area through the actually intended type.
Bad function interfaces of malloc
, realloc
and free
Unfortunately, most of the C standard library was poorly designed way back in the 1970s and the malloc
& friends functions are no exception. The main issue is the non-existent type safety - I would guess that the most common bug related to these functions is to use the wrong type when doing malloc(sizeof(type))
. But there are other interface problems as well:
The pointer passed to realloc
is not guaranteed to be the same one as returned from realloc
, even when we just wish to resize something to a larger size. Failing to re-assign the returned pointer to the original variable is a common bug.
free()
is problematic because it can't by design set the pointer passed to NULL, which would have saved us from a lot of dangling pointer bugs. Also when free()
crashes, that's an indication of a pointer bug elsewhere (heap corruption). Usually not easy to track down.
There's also the classic bug in legacy C90 where malloc
is called, the returned pointer is cast but we forgot to include stdlib.h
, ending up with an extremely subtle "implicit int" bug. There are unfortunately still plenty of C90 compilers around in the embedded systems world, so this nasty bug is still a thing.
Calling malloc
with zero size leads to poorly-defined behavior, it can allocate a zero size chunk and return a crap pointer, or it can return NULL. This leads to undefined behavior as per the most recent C23 standard.
1 comment thread