Last week marks the Mesa 26.1 branch point, and I wanted to take a moment to look back at what happened on the PanVK front.

Spoiler: it was a busy one.

The landscape

PanVK - the Vulkan driver for Arm Mali GPUs (Valhall and newer) - is a collaborative effort. Collabora has been doing incredible work on the compiler backend and the foundational infrastructure. Arm themselves are actively contributing to the open source Mali GPU stack as well, reviewing patches and pushing driver quality forward. On the Igalia side, my focus this cycle was Vulkan extension coverage. The kind of work that doesn’t make for flashy demos but is absolutely critical for real-world application compatibility - especially for things like DXVK.

Why extensions matter

A Vulkan driver without extensions is like a car without wheels - technically complete, practically useless. Applications (and translation layers like DXVK, vkd3d-proton, and Zink) probe for specific extensions and adjust their behavior accordingly. Missing even one can mean falling back to a slower path or refusing to run entirely.

Three different things drove the extension work this cycle:

  • The Proton stack - extensions consumed by DXVK and vkd3d-proton, the translation layers that make D3D9–12 games run on Vulkan.
  • DDK feature parity - extensions Arm’s binary Mali driver exposes that PanVK didn’t yet, tracked in the DDK feature parity ticket.
  • Catching up on mesamatrix.net - closing the visible gap with the other Mesa Vulkan drivers (RADV, ANV, Turnip).

So I set out to close gaps. Lots of them.

The Proton stack essentials

These are extensions DXVK and vkd3d-proton actually require - not just nice-to-haves on a recommendation list. Each one unblocks something concrete in the D3D-to-Vulkan translation path.

VK_EXT_conditional_rendering (!40452) was probably the most involved piece of work. D3D12 has predicated rendering (SetPredication), and vkd3d-proton uses this extension to implement it efficiently. It wasn’t a simple “flip a bit” situation - I had to add the core state tracking, wrap all draw and dispatch calls with conditional checks, handle inherited state in secondary command buffers, and make sure meta operations (like internal clears and resolves) properly disable conditional rendering so they don’t get accidentally skipped. That ended up being five patches touching draw paths, dispatch, and the secondary command buffer inheritance logic.

VK_VALVE_mutable_descriptor_type (!40254) is one of those extensions that exists purely because Valve needed it. In D3D, descriptor types are more fluid than in Vulkan - a descriptor slot might hold a sampler one frame and a storage buffer the next. vkd3d-proton enables this to avoid expensive descriptor set re-creation when types change. It’s a trivial alias of the already-supported VK_EXT_mutable_descriptor_type, so enabling it was a one-liner.

VK_EXT_memory_budget (!40246) lets applications (and both DXVK and vkd3d-proton) query how much GPU memory is actually available versus how much is in use. Without it, apps are flying blind on memory management, which can lead to over-allocation and stuttering. Getting the heap budget reporting right required hooking into the kernel memory accounting. (LF maybe change this to “required hooking into the kernel driver’s memory accounting function”)

VK_EXT_attachment_feedback_loop_layout (!40498) - feedback loops let you read from an attachment that you’re simultaneously rendering to (think screen-space effects that sample the current framebuffer). DXVK uses this in its D3D9 hazard layout path to avoid artifacts in certain games.

VK_EXT_shader_stencil_export (!39944) - allows fragment shaders to write stencil values directly, rather than relying on the fixed-function stencil path. DXVK leans on this in its meta-copy and meta-resolve paths, and vkd3d-proton enables it too. The Panfrost stack already supported everything needed; literally a one-line advertisement in physical_device.c.

VK_KHR_shader_untyped_pointers (!40457, v9+) - a newer KHR extension that relaxes pointer type requirements in SPIR-V. DXVK calls this out as a dependency for descriptor heaps. Restricted to v9+ because Bifrost has issues with 8-bit vector loads through untyped pointers combined with 16-bit storage. Also needed to lower memcpy derefs before explicit IO lowering.

Catching up to the DDK

The panfrost keeps a DDK feature parity ticket tracking everything Arm’s binary Mali driver exposes that PanVK doesn’t yet. Four of those got crossed off this cycle:

  • VK_ARM_scheduling_controls (!40063, CSF only) - an ARM-specific extension for controlling shader core scheduling on Command Stream Frontend (CSF) hardware. I also fixed the per-queue shader core count so CSF group creation uses the right values.
  • VK_EXT_legacy_dithering (!39781) - implements ordered dithering in the blending stage, which some applications expect from legacy APIs. Wired up the existing Panfrost dithering infrastructure (pan_dithered_format_from_pipe_format()) — just plumbing the VK_RENDERING_ENABLE_LEGACY_DITHERING_BIT_EXT flag through the blend descriptor and color attachment internal conversion paths.
  • VK_EXT_rgba10x6_formats (!40653) - a last-minute addition that just squeezed in before the branch point. This required adding the PIPE_FORMAT_X6R10X6G10X6B10X6A10_UNORM format to Mesa’s gallium format table first, then wiring it up in PanVK. Used for 10-bit per channel content in video and HDR scenarios.
  • VK_EXT_astc_decode_mode (!39799) - controls the format used when decoding ASTC compressed textures, allowing apps to choose lower-precision decoding for performance. The Panfrost hardware already supports controlling ASTC decode precision via the Decode Wide plane descriptor field; just needed to parse VkImageViewASTCDecodeModeEXT from the image view pNext chain and set astc.narrow accordingly. v9+ only because the relevant ASTC plane descriptor fields only exist from Valhall onward.

Catching up on mesamatrix

mesamatrix.net tracks Vulkan extension support across the Mesa drivers. The remaining extensions this cycle were about closing the visible gap with RADV, ANV, and Turnip — extensions that don’t have a single big consumer driving them, but whose absence shows up as red squares on the matrix and as silent fallbacks in apps that probe for them.

  • VK_EXT_color_write_enable (!39913) - per-attachment control over which color channels actually get written. The common Vulkan runtime already handled all the pipeline state and dynamic command plumbing, and panvk’s blend descriptor emission was already consuming color_write_enables, so this was effectively an “advertise the feature” change.
  • VK_EXT_depth_clamp_control (!39925) - lets applications specify a custom depth clamp range instead of always clamping to the viewport’s minDepth/maxDepth. Mali GPUs have native LOW_DEPTH_CLAMP/HIGH_DEPTH_CLAMP registers, so it was a matter of wiring the existing runtime state through to those.
  • VK_EXT_attachment_feedback_loop_dynamic_state (!40498) - the dynamic-state companion to VK_EXT_attachment_feedback_loop_layout above; lets you toggle feedback-loop state per draw call without pipeline rebuilds.
  • VK_EXT_map_memory_placed (!40315) - lets applications control where in their virtual address space GPU memory gets mapped. This simplified pan_kmod_bo_mmap() to always map the whole BO, cleaning up the kernel module interface.
  • VK_EXT_shader_atomic_float (!40506) - atomic operations on float values in shaders. The existing axchg instruction is type-agnostic, so no compiler changes were needed; image atomics are already lowered to global atomics. Just had to add R32_FLOAT to the storage-image-atomic format flag.
  • VK_EXT_nested_command_buffer (!40120, v10+) - allows secondary command buffers to call other secondary command buffers. The CSF backend’s cs_call() is a hardware call/return instruction that nests naturally, and the existing CmdExecuteCommands already does the caller/callee state merging. The 8-level hardware call stack, minus one for the kernel ringbuffer call and two reserved for future driver use, leaves maxCommandBufferNestingLevel at 5.
  • VK_EXT_image_view_min_lod (!39938) - allows clamping the minimum LOD at the image view level rather than just the sampler. Mali v6+ has per-texture-descriptor LOD clamp fields independent from the sampler’s, so this just plumbs vk_image_view::min_lod through pan_image_view into the texture descriptor — no shader lowering or descriptor merging needed.
  • VK_EXT_zero_initialize_device_memory (!39658) - guarantees that newly allocated device memory is zeroed. The kernel side already does the heavy lifting — panfrost/panthor use drm_gem_shmem, which serves zeroed pages from the shmem subsystem. And since panvk treats layout transitions as no-ops, VK_IMAGE_LAYOUT_ZERO_INITIALIZED_EXT falls out for free. (Did need one format-table fix: dropping STORAGE_IMAGE support from compressed formats to avoid crashes in the new dEQP tests.)

By the numbers

That’s 18 extensions across roughly a dozen merge requests - ranging from single-patch additions to multi-patch series like conditional rendering. Collectively they represent a meaningful shift in what PanVK can claim to support: more of the Proton stack working out of the box, four more checkboxes against the DDK, and fewer red squares on the mesamatrix.

What’s next

The extension sprint isn’t over - there are still gaps to fill, and each one removed makes PanVK more viable for real workloads. But 26.1 was a good milestone. The driver is getting to the point where you can throw a DXVK game at it and have a reasonable expectation that it just works.

Back to it. ⚡