A Primer on Memory Management

-

Back to Basics

OK, so what is a variable?

A variable stores some data. More precisely it is an abstraction to refer to some data stored in memory. It can be an integer, a string, or anything else. This data can be anything such as an integer, a character, a floating-point number, a string, or even a more complex structure. It is composed of 2 attributes:

  1. Data: The actual value or content stored in memory.

  2. Data Type: Information about what kind of data the variable holds and how it should be handled.

Where are the variables stored?

Some variables are stored in CPU registers and some are only used during compile time. But CPU registers are limited and have a small size and compile time variables are constants that the compiler can optimize out. The rest of the variables are stored in the device's memory, its primary memory (RAM).

The primary memory can be imagined as a massive array of bytes. On a 32-bit device the array size would be 4 GiB and 64-bit systems are unrealistically big. Now the type of the variable contains 2 attributes to start with: size and alignment.

Size

Size is how much space the variable uses in memory. In our array analogy, it is the number of continuous array elements it will consume.

Alignment

On several ISAs like x86_64, this does not matter but in several ISAs, it does such as MIPS. Now what is alignment? Alignment is the constraint that the memory address of a variable must be a multiple of some power of two, typically its size (or natural alignment). For primitive types, the variable is aligned in such a way that the memory address is divisible by its size. An example is if a 16-bit int will be aligned to 2 bytes and therefore can only start if the memory address is divisible by 2 such as 0x100 or 0x102 but not 0x101 or 0x103. This has some interesting ramifications when we deal with composite types such as structs or tuples. As alignment is usually only relevant when we talk about primitive types such as int, char, or double, a struct aligns its member according to its alignment and therefore will need to insert padding in between members.

struct data {
    char character;
    uint64_t id;
};

Which creates an object of the following layout in memory:

OffsetContent
0x0000char character (1B)
0x0001padding (7B)
0x0008uint64_t id (8B)

The above example is extreme but shows how is padding. Now in some compilers the packed struct which disables padding but this is not portable as not all CPUs can handle such code (but most mainstream ones can). Padding also means you may want to be careful on the order structure field so you do not waste space in padding.

Assignment

Another important operation all variables can do (Ignoring if you should) is assignment. It can be thought of as a universal operation. In the case of assigning, the value is copying the data. C's memcpy is the lowest assignment operator in its raw form (for Plain Old Data).

Now there are caveats to this but we will bring them up later in this post when we talk about lifetimes. This is especially obvious when you do this with big objects where some compiler (like gcc or clang) just converts it to a call to memcpy.

// A 64 kiB struct, too big for register
struct a {
    char a[1<<16];
};

static struct a b;

void a(struct a a) {
// assignment
    b = a; 
}

The assembly output on x86_64:

a:
    push    rbp
    mov     rbp, rsp
    lea     rsi, [rbp + 16]
    lea     rdi, [rip + b]
    mov     edx, 65536 ;; also this is assignment in x86
    call    memcpy@PLT ;; notice the memcpy
    pop     rbp
    ret

Refer and Point

As mentioned above you copy the data on each assignment. This means in places like function calls, the function has a copy of the original and its modification is not sent outside. So there is a special type of variable call pointers. Pointers are simply variables that reference another variable. In most cases, this can be the memory address and in a way the index number of the array analogy earlier. This is not fully accurate as modern OS uses virtual memory so it is the virtual memory seen by the program is also seen by the assembly code, so this abstraction doesn't affect low-level reasoning.

As mentioned, pointers are simply variables whose data are being referenced and they are simply memory addresses which are unsigned integers. So this means you can do a set of operations on it.

Load and Store operations

As the data is referencing another variable and if you want to read or write to the referenced variable, you simply load or store. Load is reading from memory and store is the write. Now to make life easier this is combined with a single operator called a dereferencing operator. You can see the usage in C and Zig below.

In C

*x = 10; // Store
y = *x; // Load

In Zig:

x.* = 10; // Store
y = x.*; // Load

Pointer Arithmetic

As a pointer is effectively an integer and therefore with some caveats. The addition and subtraction of a pointer works. When adding as alignment is important, it treats it as an array of the type and we go to the next index. Similarly when subtracting from a regular integer. When 2 pointers subtract each other, we get the offset in memory. Note the caveat about as mentioned above, this is only true inside an object and not generally.

This can be seen in C's arrays it is simply a syntax sugar when both pointer arithmetic and dereferencing are done.

a[1] = 10;
// is equivalent
*(a + 1) = 10;

References

A reference is a more abstract concept than pointers. They reference other objects like pointers but do not necessarily contain the memory address and are more abstract. They can do load and store but not pointer arithmetic. Some languages use this as a safer version of pointers such as C++. I will also mention Reference Counting and Handles which are a common way to do this in several languages.

Why do you need references??

Well to understand, we need to remember how the assignment operator works. It simply copies data. The parameters of a function are assignments. So when one passes a parameter, the original variable is not modified.

#include <assert.h>
#include <stdlib.h>

struct foo {
    char var;
};

void bar(struct foo self) {
    self.var = 'a'; // Will only modify local copy
}

int main(void) {
    auto foo = (struct foo){ .var='z' };
    bar(foo); //send a copy
    assert(foo.var == 'a'); //Will fail
    return EXIT_SUCCESS;
}

Now if you do the same for any reference, you are not copying the variable but just the reference. For a pointer that will be just the memory address being copied. Now since the only reference is copied, this has interesting implications when doing store. When you dereference and write to the variable, it will update the original variable. Even in places like functions.

So let us repeat the above example with references instead.

#include <assert.h>
#include <stdlib.h>

struct foo {
    char var;
};

void bar(struct foo* self){ // passing a pointer
    (*self).var = 'a'; // Will modify the original
}

int main(void){
    auto foo = (struct foo){.var='z'};
    bar(&foo); //send a reference
    assert(foo.var == 'a'); //Will pass
    return EXIT_SUCCESS;
}

Also if you are working on certain algorithms or data structures, this allows you to store the reference for later processing. One point to add is that assigning pointers is often called shallow copy as the pointer is copied but not the data.

Another thing to note is the time of copying. If the variable is less than the word size of the CPU or in other words the biggest register, there is not much performance penalty. But if you are transferring a bigger object such as a big data structure, there is a visible cost in the copy. This is negligible unless you are copying a multi kilobyte struct but it is something to keep in mind.

This also leads to an interesting trivia. As most variables are not allocated to the registers but most CPU operations must work on registers which means in machine code the variable is almost always a pointer and all operations are surrounded by a load or store when doing anything. CISC hides this in the microcode but look at any MIPS assembly output and the code will be littered with LW and SW as shown below for adding 2 variables.

; int c = a + b;
    lw      $1, 12($fp) ; load
    lw      $2, 8($fp)  ; load
    addu    $1, $1, $2  ; the actual action
    sw      $1, 4($fp)  ; store

Functions in compiled code are invoked via their memory address which is referred to as function pointers in C-like languages.

Another interesting thing that is possible although unsafe is that if you have a pointer to a member of the struct and have its offset in the struct, you can obtain the reference to its entire struct. This has several use cases such as how subtyping polymorphism can be implemented but that is off topic for now.

Advanced Hardware Memory Models

The above model is a simplistic view. There are few memory models in the wild and some nuances that then to be overlooked which is important to take note

Basic Units of Memory

This seems simple at first as most CPUs have few variations here but this is still important to note.

Bits simply represents a single base 2 digit (0 and 1).

Bytes is the basic addressable memory of a system. On almost all modern systems it is 8 bits. In fact, this is mandated by POSIX and WIN 32 so the on systems are legacy systems or embedded systems. Most languages also make this assumption (not C but most libraries make this assumption anyway).

Words is the unit of data that a CPU processes natively in one operation. Usually, this represents the register size or the size of the pointer. This is usually uintptr_t in C. 32-bit systems have 32-bit word size while 64-bit have 64-bit word size.

Memory Models

The flat array model of memory is an oversimplification which is good enough for most use cases. But it is still good to know underneath the hood too.

Von Neumann Architecture

Von Neuman Architecture is where code and data use the same memory bus and therefore the same address space. This is the same as the model mentioned earlier. Most modern non-embedded CPUs use this architecture with virtual memory. The biggest limitation of the model is there is a bottleneck due to fetching code and data is the same bus (not parallel).

Harvard Architecture

The data and code use different memory buses and therefore different addressing spaces. This is common in embedded where the code is in ROM. Self-modifying and JIT code is hard or impossible to implement. Probably the most famous one is AVR used by Arduino.

Segmented Memory

Mostly obsolete but was popular in the past. In this, the memory was divided into segments. Each segment has the addressable memory space (this is the pointer). A way to think of this is the RAM is a 2D array with a segment being the new dimension. Was used as a hack to deal with small word sizes in the 16-bit era.

Virtual Memory

This is an abstraction found in modern systems. In this case, the userspace gets a unique view of the memory (per process normally). This is managed by the MMU which the kernel controls. The RAM is divided into pages and each page is mapped to a process. This effectively sandboxes the processes so they do not access other processes' memory.

Cache

Modern CPUs need multiple loads and stores so they have their internal memory called the cache. There are usually multiple layers. This is mostly transparent but there are caveats, especially on performance. As cache gets a copy of a section of memory that is used, which means memory locality is important as objects that are close together in RAM are faster than one further apart (fragmented memory). This is relevant for the heap mentioned below.

Lifetimes and Memory Storage

Well, a variable does not exist from the start till the end of the application(mostly). Each variable has a lifetime. One thing that must be clear is that any memory needs to be allocated at the start of the lifetime and deallocated when they are no longer needed. Trying to access variables is to deallocated pointer is invalid and this is called dangling pointer. The memory being referenced can be used for something else and therefore the behavior is undefined.

Static Memory

Now global variables have a lifetime entire program so they can be allocated with the entire program and can deallocated when the program ends. As long these objects need to allocate the safe size these variables can always (if the compiler allows) be allocated in static memory. This is stored in binary in compiled languages (Sometimes in a way to be auto-allocated).

#include <assert.h>
#include <stdlib.h>

struct foo {
    char var;
};

static struct foo foo; // Global variable that is allocated in static

void bar(){
    foo.var = 'a'; // Will modify the global variable
}

int main(void){
    foo = (struct foo){.var='z'};
    bar(); // Call the function.
    assert(foo.var == 'a'); // Will pass as foo is global is shared in all functions
    return EXIT_SUCCESS;
}

There may be copies per thread that have a lifetime equivalent to threads.

Stack Memory

Now you have local variables to a particular programming block such as function or in some cases constructs like if-else or loops. As these can be nested or repeated due to recursion you need multiple copies of them, unlike static variables. However, the lifetimes of these variables are equal to the runtime of the block. To make life easier, the OS allocates a stack that can be used for this. There are some important things to know about the stack. The stack is a LIFO structure and its top is stored as the stack pointer position. So once deallocation occurs, the stack point is reset to the value before the block. In a sense, the following pseudocode gives an idea of how it is done.

static void *sp = &stack; // The entire stack can be thought of as a giant static memory

T* alloc<typename T>(){
    align(&sp, alignof T); // Make sure the new pointer are in the correct alignment
    T* new = sp; // set the new pointer to the current stack pointer
    sp += sizeof T; // Update to stack pointer to length of the pointer
    return new;
}

void dealloc<typename T>(T* old){
    sp = old; // reset
}

Observe the dealloc function or method. It is fast but it frees anything allocated after it. This means that while it's good for local variables, it also implies that the return variable of a function must be at the top of the stack after the function returns. This means returning the pointer to the local variable is impossible. Any variable with a lifetime that is unknown and a variable that has an unknown size at compile time can not use the stack.

Another major limitation is stack has a limit on size for most OSes in the range of 1-16 MiB. If you exceed this, you will face a stack overfull and can not store more data in the stack (safely without crashing or ending up somewhere you should not).

The Heap

The last place is the heap. Now when static and stack are unsuitable, there is the heap. Now heap unlike in stack or static, is not given freely to the user by the environment but the user must manually allocate the OS from memory and deallocate. This has a few major advantages, mainly as the user is control over size and lifetimes. The object size is unknown unless the language does the allocation for the use.

This is especially used when lifetimes are very different and custom. Examples are network sockets or widgets.

#include <assert.h>
#include <stdlib.h>

[[nodiscard]] char *bar(){
    char* a = malloc(10); // request 10 bytes of heap from runtime
    if (!a){
        return nullptr; // malloc fails discuss later
    }
    strcpy(a, "hello");
    return a; // Will not free a
}

int main(void){
    auto str = bar(); // Call the function.
    if (!str){
        return EXIT_FAILURE;
    }
    puts(str); // print the string
    free(str); // release the memory
    return EXIT_SUCCESS;
}

Memory Safety

Memory safety is about avoiding bugs where you violate type or lifetime guarantees. A few common issues:

Unsafe Pointer Arithmetic

Doing raw arithmetic can take you out of bounds, either into another variable or into unmapped memory leading to corruption or segmentation faults.

char a[16] = {0};

// Can give data from another variable or segmentation fault
char const b = *(&a + 17); // out of bounds
char const b = *(&a - 1); // out of bounds

Missing Metadata

In low-level languages like C, heap allocations don't retain type metadata, which makes safety checks impossible without external info. A good example of this size is a source of out of bound error. Most modern languages handle this as either you have some form of a fat pointer (a pointer with data attached to it) like slices, span (in C++), or just arrays.

If not available the solution is generally to store the metadata out of bounds by making your fat pointer type or request in your function.

// example pattern
void memzero(void* data, size_t size);

Null Pointer

This is a generic invalid pointer found in many languages. It cannot be dereference and attempt to do so in most languages is a runtime error (segmentation fault in C-like languages, exception/panic is languages that have them).

The usual semantic is this pointer/reference is not pointing to any variable. If it is possible to get NULL (the type system does not provide a method to check at compile time), check it before any form of dereferencing. A lot of languages have nullish operators (?.) which are useful shortcuts when dealing with null pointers.

Out of Memory Errors

Most Programs assume malloc will not error which is mostly true as operating systems overcommit memory. But still either due lack of overcommit or out of memory or other reasons (such as internal fragmentation). This must be handled (malloc returns null and exception in certain languages). Be very careful and never assume malloc will always return memory.

Lifetime Errors

There are three types of lifetime errors:

  • Use after free: using memory after it's been deallocated. Can corrupt future allocations.

  • Use before initialized: using memory before it's been initialized. Can get garbage data.

  • Memory leak: memory is never deallocated after use, slowly consuming RAM.

In C, manual memory management means you have to:

  • track lifetimes

  • track the ownership

  • store sizes

  • Be careful with pointer arithmetic.

Memory Management Approaches

Languages and runtimes use different strategies to manage memory safely and efficiently.

Garbage Collection

Used in Python, JavaScript, Java, etc. The runtime tracks which variables are "still in use" and frees unreferenced ones. This avoids most manual bugs but comes with costs:

  • Garbage Collection adds runtime overhead and pauses.

  • May not detect cycles without extra work (some Garbage Collectors leak cyclic data).

  • Performance is unpredictable in tight loops or latency-sensitive systems.

But this is automatic which can simplify code especially since in some languages (dynamic and prototypical) a type/object is more complicated in memory than a simple size+data mentioned earlier.

A few languages like Go have escape analysis which allows them to determine whether a variable escapes the current function scope. If it does not, it can be safely allocated on the stack even when the syntax appears heap-bound.

Reference Counting

A simpler alternative to Garbage Collection (Sort of the simplest and naive possible implementation). Each object tracks how many references point to it. When the reference count hits zero, it's freed.

  • Easier to reason about.

  • Handles shared ownership.

  • Doesn't clean up cycles (unless using weak references or hybrid collectors).

A classic example is C++ smart pointers or Rust's Arc.

The bellow pseudocode shows this.

struct ref<typename T>{
    T* ptr;
    unsigned ref_count = 1;
}

void assign(ref<T> *self){
    self->ref_count++;
}

void drop(ref<T> *self){
    self->ref_count--;
    if(0 == self->ref_count){
        free(self->ptr); // Lifetime of the pointer is over
    }
}

Ownership + Lifetimes in Type System

Seen in Rust. The compiler encodes lifetimes and ownership rules directly in types.

  • Compiler ensures memory safety statically.

  • You can't use freed memory or double-free.

  • But this comes at a cost: increased mental overhead, especially in complex cases (e.g. double-linked lists, graphs).

Arenas / Region Allocators

Multiple objects are allocated in a chunk and freed together. Great for when lifetimes can be grouped (e.g. AST nodes, temporary buffers).

  • Reduces malloc/free churn.

  • Simplifies lifetime logic.

  • But causes waste if lifetimes are mixed (over-retention).

This is a great technique and is popular in modern C and Zig.

A method that I use in the case of trees is to create an object pool (which is similar to an arena but only for a single type). This simplifies clean up of the tree when done.

RAII

RAII is a method that can be considered a very simplified automatic version of resource management (in which heap is a resource). There are 2 variants of this, defer and destructor.

In some OOP languages, a destructor is called when the pointer in the stack is freed. This allows cleaning of other stuff including the memory held by the object. This is a common method in C++ and Rust especially when combined with reference counting.

Similar pattern for defer but this works by inserting code hooks to the end of the scope so you can call clean up after you are done. This is in Zig.

var list = ArrayList(u8).init(allocator);
defer list.deinit();

Now this method is not memory management specific and ownership and lifetimes are technically also for system resources like files, devices, windows, etc. So other languages have these mechanisms although not relevant in this case.

Although C lacks this, defer can be emulated by goto/long jump and is probably one of the few legitimate uses of that keyword.

Free on Exit

This is a valid method although not recommended. In many runtimes and OS, exiting the process will free all heaps too. So you can avoid freeing the heap. This can be used if your program is short. Personally, this is an anti-pattern unless you are exiting due to an unrecoverable error/exception. The reason it is an anti-pattern is that you risk creating a memory leak once the code expands (the same logic against global variables or singletons).

Handles

This is a mix of Arenas and the abstract reference before. The idea is instead of providing a pointer to an object, pass a handle(simplest is an id). The actual object is stored in a subsystem's internal array or hash map. This is powerful as the ownership belongs to the subsystem. Depending on the mechanism, use after free of the handle is detectable if the correct metadata is handled.

This is powerful but is more complicated and works with more long-term systems.

Implicit References in High-Level Constructs

Just in case you are wondering in the scenario when your language does not have pointers, how it is working?

Well, most language implicitly uses pointers for stuff you may not notice except the pass-by-reference at times. I will explain why.

A couple of language features need pointers for implicitly carrying context where the context of the function is smuggled in. The common features are:

  • Subtyping Polymorphism: In the form of inheritance or interfaces, the method must get the correct type (this or self in many languages). This usually works with methods having a virtual table and a mechanism (either a pointer in the virtual table or via some pointer arithmetic) to get the actual type. In the case of prototypical inheritance (in JavaScript and Lua), the objects are hash maps that need to be stored in RAM due to the nature of mutable hash maps.
func sum(x int) func(b int) int {
    return func (b int){
        return x+b // x is captured and needs to be stored somewhere
    }
}
  • Lambdas: Lambdas or nested functions need to capture the local variable. But as the function needs a copy of all the variables since it is often passed as callbacks or returned.

  • Coroutines: The backbone of any asynchronous system. The coroutine must be able to suspend itself and therefore save its stack somewhere else while the rest of the program runs. This is again stored in the heap.

Closing Thoughts

A few general principles hold across all strategies:

  • Prefer single ownership if possible. It's easiest to reason about.

  • Use reference counting for shared ownership.

  • Encode data size somewhere, either via the type system or with fat pointers. This helps with bounds checks and memory layout.

  • Respect lifetimes. Use stack, RAII, or arenas when you can.

  • If you're not working on a performance-critical system (e.g. simple GUI or script), garbage collectors are fine. The mental overhead saved is often worth it.

Even in high-level systems, understanding the low-level trade-offs helps avoid performance cliffs and debug nasty bugs faster.