Odin · structs & #soa

A struct, and the one-keyword layout pivot

A struct is named fields laid out one after another in memory. Each field starts at an offset that satisfies its type's alignment, the compiler inserts padding to make that work, and the whole struct's size_of rounds up so an array of them stays aligned. Pick a field and watch which bytes are its data — and which are padding nobody asked for.

Real offsets & sizes, measured from a compiled Odin program (claims/lessons/12-structs-and-soa/struct-layout). Byte index is top-left of each cell.

id : u8 pos : [3]f32 alive : bool mass : f64 padding

Sizes (size_of)

u8     = 1
bool  = 1
[3]f32 = 12
f64   = 8

Mover (struct) = 32
align_of = 8
= 22 bytes of fields
  + 10 bytes of padding

What you're seeing

the struct is just the data A struct body is a list of fields — nothing else lives inside it. There is no method, constructor, property, or hidden header attached to Mover; behavior lives in free procedures that take a ^Mover. That's the property the rest of this page leans on: because nothing clings to the type, the compiler is free to reorganize the bytes — which is exactly what Level 2 does. A fresh m: Mover is also fully zero-initialized; every byte above starts at 00, padding included.

Level 1 was what a struct is. Now the payoff Odin builds on top of it: #soa. Write it in front of an array type and the compiler pivots the layout — an array of structs becomes a struct of arrays — while arr[i].field reads exactly the same. Use a Particle :: struct { pos: [3]f32, vel: [3]f32, mass: f32 } and watch where one field lives in each layout.

The reach-for-it case: a hot loop that touches a few fields across many elements — say a position update pos += vel * dt over a million particles. A CPU never fetches one float; it pulls a whole cache line and the next read from inside that line is essentially free. So the question that decides speed is: of the bytes a cache line drags in, how many does this loop actually want?

Why the array-of-structs layout loses here: consecutive particles' pos values sit 28 bytes apart — a whole Particle, with that particle's vel and mass wedged between the two pos values. A loop that only reads pos still hauls every interleaved vel and mass through cache and then ignores them.

Why #soa wins: after the pivot, consecutive pos values sit 12 bytes apart — back to back, nothing between them (12 is exactly size_of([3]f32)). The loop touches only the bytes it needs, packed contiguously; the mass column's cache lines are never fetched at all. Same data, same total size — a better road through it.

The counterfactual: if the loop touched most or all fields of one element at a time (read pos, vel and mass together), the array-of-structs layout would be the right call — one element's fields are already adjacent, and #soa would spread them across separate columns and fetch a line per field. For a handful of elements it makes no measurable difference either way. #soa earns its keep precisely when a hot loop reads a narrow slice of fields over a wide run of elements.

Two costs come with all this. First, the padding you saw in Level 1 is real bytes you pay for in every element. Second, #soa is not magic compression — the pivot moves bytes, it never removes them.

The bill on padding: field order decides padding, and padding is paid per element. The Level 1 Mover spent 10 of its 32 bytes on padding. Reorder the same four fields largest-alignment-first and the small fields drop into what was padding — toggle it:

struct size

real field bytes

The fix: order fields large-to-small to soak up padding, and reach for #packed only when you must match a byte-exact binary protocol (network packet, file header, hardware register) — packed structs can force slow or trapping misaligned loads, so it's a tool for the wire, not for speed.

The bill on the pivot: a #soa array holds the same total bytes as the array of structs — both [4]Particle and #soa [4]Particle are 112 bytes. The pivot is a transposition, not a packing: the 28 bytes per particle still exist, just re-grouped by column instead of by element. You buy locality for the hot loop, not smaller memory.

a plain array has no field columns Per-field columns are something only an #soa array exposes. On a normal [4]Particle, reaching for aos.pos is a compile error — there's no pos field on the array type, only on the Particle element:

aos: [4]Particle
fmt.println(aos.pos)   // no such field on the array

Error: 'aos' of type '[4]Particle' has no field 'pos'

The column view is a thing the #soa pivot grants; it is not a property of arrays in general.

The last level is the property that makes #soa feel like a superpower rather than a trick: the layout is negotiable without touching the code that uses it. One keyword flips the bytes; not one line at the call sites changes.

One storage, two names. After the pivot, the i-th particle's position is reachable two ways — the row view soa[i].pos (the same indexing you'd write on a plain array) and the column view soa.pos[i]. They are not copies that happen to agree; they resolve to the identical address:

soa: #soa [4]Particle
// the row view and the column view name one storage location
assert(&soa[0].pos == &soa.pos[0])   // holds: same address

The free column views. Because the bytes are now grouped by field, every field name becomes an array view of length N at no cost — soa.pos is a [4][3]f32, soa.mass is a [4]f32. That [4]f32 is already the contiguous run a vectorized loop, a GPU upload, or an audio mixer wants — no copy-out, no strided gather. Filling four particles like the lesson and reading the columns back:

// type soa.pos  : [4][3]f32
// type soa.mass : [4]f32
soa.pos column : [[0.5, 0, 0], [1.5, 0, 0], [2.5, 0, 0], [3.5, 0, 0]]
soa.mass column: [1, 1, 1, 1]

The emergent payoff: in a real codebase you sketch with the array of structs because one element reads naturally; then a profiler points at one hot table, you prepend #soa, and every arr[i].field in the calling code keeps working unchanged while the bytes underneath are now cache-friendly columns. The cost of switching is one keyword and zero edits to the consumers — which is the entire reason to keep behavior out of the struct in the first place. A type that's only data is a type whose shape stays yours to negotiate.

That's the arc: L1 a struct is named fields with alignment-driven padding → L2 #soa pivots an array's bytes so a hot loop touches only the field it needs, contiguously → L3 the bill is padding-per-element and that the pivot moves bytes (112 = 112) rather than shrinking them → L4 the pivot is one keyword with zero changes to calling code, because the struct is only data.

probes reproduce with odin run · sizes, offsets, strides, column types & the L3 error are real compiler output (claims/lessons/12-structs-and-soa) · perf shown qualitatively only — the measured SoA win sits inside the harness noise band