Compute Shader入门

Number of views 44

早期的显卡以非常固定的方式进行渲染,甚至光照算法都是在专用硬件上实现的。而现代显卡是高度并行的,基于此,现代显卡本身并不内置3D的相关特性,它只是读取数据,以并行的方式通过自定义的算法对结果进行对应输出。Compute Shader就是基于上述理论诞生的。Compute Shader对数据的处理也有维度的概念,最多有3个维度。

untitleddesigne1606325295800.png

假设我们需要处理576个长度的数据,在现代GPU中处理这些对象时会对其进行分组,即Work Group,可以把工作组看作是小集合的线程。每个线程都是并行运行的。以WGSL为例,在 WGSL 中静态定义工作组的大小。工作组的大小是三维的,但默认为 1,因此 @workgroup_size(1) 等价于 @workgroup_size(1, 1, 1)。

如果我们将工作组定义为 @workgroup_size(3, 4, 2),那么我们就定义了 3 * 4 * 2 个线程,换句话说,我们定义了一个包含 24 个线程的工作组。假设我们要对这些数据各自进行翻倍。我们可以通过如下Compute Shader定义:

@group(0) @binding(0) var<storage, read_write> data: array<f32>;
 
@compute @workgroup_size(1) fn computeSomething(
  @builtin(global_invocation_id) id: vec3<u32>
) {
  let i = id.x;
  data[i] = data[i] * 2.0;
}

工作组内的每个线程对应一个invocation id,它用于唯一确定线程的位置,每个线程是计算着色器的最小执行单元,称为一个 调用(Invocation)。Invocation又分为Local InvocationGlobal Invocation。

Local Invocation

标识线程在工作组内的位置,通常为3维坐标,取值范围从(0, 0, 0)到 (localSize_{x-1}, localSize_{y-1}, localSize_{z-1}),

上面说到我们假设定义每个工作组的大小为 (3, 4, 2),则Local Invocation ID 范围是 (0~2, 0~3, 0~1)。如下图所示:

image1745984071312.pngLocal Invocation ID在不同着色器语言中都有其内置的属性,如下表:

着色器语言 内置变量/属性
GLSL gl_LocalInvocationID
HLSL SV_GroupThreadID
MSL thread_position_in_threadgroup
WGSL @builtin(local_invocation_id)

Global Invocation

它标识线程在整个计算任务中的全局位置,它的计算公式如下:

$$
GlobalInvocationID = WorkGroupID * WorkGroupSize + LocalInvocationID
$$

  • WorkGroupID:当前工作组在全局中的坐标(如 (2, 3, 0))。
  • WorkGroupSize:工作组大小(如 (3, 4, 2)

如果我们通过代码调用 pass.dispatchWorkgroups(4, 3, 2),那么意思就是,执行一个包含 24 个数目的工作组(WorkGroupNum),即总共执行 4 * 3 * 2 次(即 24 次工作组,每个工作组24个线程),总计 576 个线程。如下图WorkGroupNum为(4, 3, 2),它的坐标轴即是WorkGroupID,紫色部分的WorkGroupID即是(3, 0, 0)。

image1745984759811.png

Global Invocation ID在不同着色语言中的内置情况如下:

着色器语言 内置变量/属性
GLSL gl_GlobalInvocationID
HLSL SV_DispatchThreadID
WGSL @builtin(global_invocation_id)
Metal 通常通过参数传递或调度信息获取

Local Invocation Index

虽然工作组的线程通常以三维的方式进行组织,但是某些场景下需要转换为一维的方式进行数据访问:

  • 访问一维共享内存数组:共享内存(Shared Memory)通常声明为一维数组,需将线程的三维坐标映射到一维数组的索引下标。
  • 简化逻辑:一维索引更便于处理循环、归约操作(如求和、求最大值)等。

它的转换结果即是求取Local Invocation Index的过程(注意与Local Invocation ID做区分),转换公式如下:

rowSize = workgroup_size.x;
sliceSize = rowSize * workgroup_size.y;
local_invocation_index = 
    local_invocation_id.x +
    local_invocation_id.y * rowSize +
    local_invocation_id.z * sliceSize;

看上去有点懵,回到上面Local Invocation ID的图示,如果我们要求取下图中白块的Local Invocation Index:

image1745997148636.png

根据公式可得:

rowSize = 3;
sliceSize = 3 * 4 = 12;
LocalInvocationIndex = 
    2 +
    0 * 3+
    1 * 12 = 14;

Local Invocation Index的最后结果为14,是正确的么?我们标记下索引:

image1746001236620.png发现结果是没有任何问题的。其实这个算法与我们在屏幕中要得到对应坐标像素值的算法是一致的。唯一不同的是增加了一个维度信息,即z坐标。

这是Local Invocation Index在各个着色器语言中的内置情况:

着色器语言 内置变量/属性
GLSL gl_LocalInvocationIndex
HLSL SV_GroupIndex
WGSL @builtin(local_invocation_index)
Metal 无直接对应,需自行计算

Work Group Index

Work Group Index也称为Global Work Group Index,我们知道了Local Invocation Index的求取方式,同样的对于求取Local Invocation Index的公式也适用于Work Group Index的求取。公式如下:

rowSize = workGroupNum.x
sliceSize = rowSize * workGroupNum.y
workGroupIndex = workGroupID.x + workdGroupID.y*rowSize + workGroupID.z*sliceSize

目前各个着色器语言没有直接提供内置的属性对Work Group Index进行访问。

Global Invocation Index

我们知道了Work Group Index后,就可以求取Global Invocation Index了,公式如下:

globalInvocationIndex = workGroupIndex * (workGroupSize.x * workGroupSize.y * workGroupSize.z)
                        + LocalInvocationIndex

如果我们已经知道了Global Invocation ID,我们也可以通过上述在求取Local Invocation Index或 Work Group Index的方式来求取Global Invocation Index的值:

rowSize = workGroupNum.x * workGroupSize.x
sliceSize = rowSize * workGroupNum.y * workGroupSize.y
globalInvocationIndex = globalInvocationID.x + globalInvocationID.y*rowSize + globalInvocationID.z*sliceSize

同样的,目前各个着色器语言没有直接提供内置的属性对Work Group Index进行访问。

下面我们可以试着使用上述概念,对Storage Buffer进行写入:

const dispatchCount = [4, 3, 2];
const workgroupSize = [2, 3, 4];

// 计算数组所有元素的乘积
const arrayProd = arr => arr.reduce((a, b) => a * b);

const numThreadsPerWorkgroup = arrayProd(workgroupSize);

const code = `
// 注意!: vec3u 会被填充到4字节对齐
@group(0) @binding(0) var<storage, read_write> workgroupResult: array<vec3u>;
@group(0) @binding(1) var<storage, read_write> localResult: array<vec3u>;
@group(0) @binding(2) var<storage, read_write> globalResult: array<vec3u>;

@compute @workgroup_size(${workgroupSize}) fn computeSomething(
    @builtin(workgroup_id) workgroup_id : vec3<u32>,
    @builtin(local_invocation_id) local_invocation_id : vec3<u32>,
    @builtin(global_invocation_id) global_invocation_id : vec3<u32>,
    @builtin(local_invocation_index) local_invocation_index: u32,
    @builtin(num_workgroups) num_workgroups: vec3<u32>
) {
  // workgroup_index 的作用类似于 local_invocation_index,
  // 但它是针对工作组(workgroup)而非工作组内的线程。
  // 因为没有内置属性,我们需要手动计算它。

  let workgroup_index =  
     workgroup_id.x +
     workgroup_id.y * num_workgroups.x +
     workgroup_id.z * num_workgroups.x * num_workgroups.y;

  // global_invocation_index 的作用类似于 local_invocation_index,
  // 但它会跨所有分派的工作组进行线性编号。
  // 因为没有内置属性,我们需要手动计算它。

  let global_invocation_index =
     workgroup_index * ${numThreadsPerWorkgroup} +
     local_invocation_index;

  // 现在我们可以将这些内置属性写入对应的存储缓冲区中。
  workgroupResult[global_invocation_index] = workgroup_id;
  localResult[global_invocation_index] = local_invocation_id;
  globalResult[global_invocation_index] = global_invocation_id;
`;


我们使用了 JavaScript 的模板字符串,以便可以从 JavaScript 变量 workgroupSize 中动态设置工作组大小。但最终,这个值会被硬编码到着色器代码中。

既然我们已经有了着色器,接下来可以创建 3 个缓冲区来存储这些结果。

const numWorkgroups = arrayProd(dispatchCount);
const numResults = numWorkgroups * numThreadsPerWorkgroup;
const size = numResults * 4 * 4;  // vec3f * u32
 
let usage = GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_SRC;
const workgroupBuffer = device.createBuffer({size, usage});
const localBuffer = device.createBuffer({size, usage});
const globalBuffer = device.createBuffer({size, usage});

由于我们无法将存储缓冲区(storage buffers)直接映射到CPU即 JavaScript的处理逻辑中,因此我们需要创建一些可映射的缓冲区(mappable buffers)来存储数据。我们可以通过以下步骤操作:

  • 将存储缓冲区中的结果复制到这些可映射的结果缓冲区(mappable result buffers)中;
  • 然后通过 JavaScript 读取这些结果缓冲区的数据。
 usage = GPUBufferUsage.MAP_READ | GPUBufferUsage.COPY_DST;
 const workgroupReadBuffer = device.createBuffer({size, usage});
 const localReadBuffer = device.createBuffer({size, usage});
 const globalReadBuffer = device.createBuffer({size, usage});

我们创建一个bindgroup来绑定所有的Storage Buffers:

const bindGroup = device.createBindGroup({
    layout: pipeline.getBindGroupLayout(0),
    entries: [
      { binding: 0, resource: { buffer: workgroupBuffer }},
      { binding: 1, resource: { buffer: localBuffer }},
      { binding: 2, resource: { buffer: globalBuffer }},
    ],
  });

接着我们创建一个编码器(encoder)和一个计算通道编码器(compute pass encoder),然后添加运行计算着色器的命令:

// Encode commands to do the computation
const encoder = device.createCommandEncoder({ label: 'compute builtin encoder' });
const pass = encoder.beginComputePass({ label: 'compute builtin pass' });
 
pass.setPipeline(pipeline);
pass.setBindGroup(0, bindGroup);
pass.dispatchWorkgroups(...dispatchCount);
pass.end();

我们需要将结果从Storage Buffer复制到可映射的结果缓冲区。

  encoder.copyBufferToBuffer(workgroupBuffer, 0, workgroupReadBuffer, 0, size);
  encoder.copyBufferToBuffer(localBuffer, 0, localReadBuffer, 0, size);
  encoder.copyBufferToBuffer(globalBuffer, 0, globalReadBuffer, 0, size);

然后结束编码器并提交命令缓冲区。

  // Finish encoding and submit the commands
  const commandBuffer = encoder.finish();
  device.queue.submit([commandBuffer]);

为了读取结果,我们通过映射缓冲区操作,并在它们准备就绪后,获取其内容的类型化数组视图(TypedArray Views)。

  // Read the results
   await Promise.all([
    workgroupReadBuffer.mapAsync(GPUMapMode.READ),
    localReadBuffer.mapAsync(GPUMapMode.READ),
    globalReadBuffer.mapAsync(GPUMapMode.READ),
  ]);
 
  const workgroup = new Uint32Array(workgroupReadBuffer.getMappedRange());
  const local = new Uint32Array(localReadBuffer.getMappedRange());
  const global = new Uint32Array(globalReadBuffer.getMappedRange());

之后,我们通过从ComputeShader拿到的数据,做额外的操作,目前我们直接打印数据:

  const get3 = (arr, i) => {
    const off = i * 4;
    return `${arr[off]}, ${arr[off + 1]}, ${arr[off + 2]}`;
  };
 
  for (let i = 0; i < numResults; ++i) {
    if (i % numThreadsPerWorkgroup === 0) {
      log(`\
 ---------------------------------------
 global                 local     global   dispatch: ${i / numThreadsPerWorkgroup}
 invoc.    workgroup    invoc.    invoc.
 index     id           id        id
 ---------------------------------------`);
    }
    log(` ${i.toString().padStart(3)}:      ${get3(workgroup, i)}      ${get3(local, i)}   ${get3(global, i)}`)
  }
}
 
function log(...args) {
  const elem = document.createElement('pre');
  elem.textContent = args.join(' ');
  document.body.appendChild(elem);
}

这是打印结果:

---------------------------------------
global                 local     global   dispatch: 0
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
  0:      0, 0, 0      0, 0, 0   0, 0, 0
  1:      0, 0, 0      1, 0, 0   1, 0, 0
  2:      0, 0, 0      0, 1, 0   0, 1, 0
  3:      0, 0, 0      1, 1, 0   1, 1, 0
  4:      0, 0, 0      0, 2, 0   0, 2, 0
  5:      0, 0, 0      1, 2, 0   1, 2, 0
  6:      0, 0, 0      0, 0, 1   0, 0, 1
  7:      0, 0, 0      1, 0, 1   1, 0, 1
  8:      0, 0, 0      0, 1, 1   0, 1, 1
  9:      0, 0, 0      1, 1, 1   1, 1, 1
 10:      0, 0, 0      0, 2, 1   0, 2, 1
 11:      0, 0, 0      1, 2, 1   1, 2, 1
 12:      0, 0, 0      0, 0, 2   0, 0, 2
 13:      0, 0, 0      1, 0, 2   1, 0, 2
 14:      0, 0, 0      0, 1, 2   0, 1, 2
 15:      0, 0, 0      1, 1, 2   1, 1, 2
 16:      0, 0, 0      0, 2, 2   0, 2, 2
 17:      0, 0, 0      1, 2, 2   1, 2, 2
 18:      0, 0, 0      0, 0, 3   0, 0, 3
 19:      0, 0, 0      1, 0, 3   1, 0, 3
 20:      0, 0, 0      0, 1, 3   0, 1, 3
 21:      0, 0, 0      1, 1, 3   1, 1, 3
 22:      0, 0, 0      0, 2, 3   0, 2, 3
 23:      0, 0, 0      1, 2, 3   1, 2, 3
---------------------------------------
global                 local     global   dispatch: 1
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
 24:      1, 0, 0      0, 0, 0   2, 0, 0
 25:      1, 0, 0      1, 0, 0   3, 0, 0
 26:      1, 0, 0      0, 1, 0   2, 1, 0
 27:      1, 0, 0      1, 1, 0   3, 1, 0
 28:      1, 0, 0      0, 2, 0   2, 2, 0
 29:      1, 0, 0      1, 2, 0   3, 2, 0
 30:      1, 0, 0      0, 0, 1   2, 0, 1
 31:      1, 0, 0      1, 0, 1   3, 0, 1
 32:      1, 0, 0      0, 1, 1   2, 1, 1
 33:      1, 0, 0      1, 1, 1   3, 1, 1
 34:      1, 0, 0      0, 2, 1   2, 2, 1
 35:      1, 0, 0      1, 2, 1   3, 2, 1
 36:      1, 0, 0      0, 0, 2   2, 0, 2
 37:      1, 0, 0      1, 0, 2   3, 0, 2
 38:      1, 0, 0      0, 1, 2   2, 1, 2
 39:      1, 0, 0      1, 1, 2   3, 1, 2
 40:      1, 0, 0      0, 2, 2   2, 2, 2
 41:      1, 0, 0      1, 2, 2   3, 2, 2
 42:      1, 0, 0      0, 0, 3   2, 0, 3
 43:      1, 0, 0      1, 0, 3   3, 0, 3
 44:      1, 0, 0      0, 1, 3   2, 1, 3
 45:      1, 0, 0      1, 1, 3   3, 1, 3
 46:      1, 0, 0      0, 2, 3   2, 2, 3
 47:      1, 0, 0      1, 2, 3   3, 2, 3
---------------------------------------
global                 local     global   dispatch: 2
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
 48:      2, 0, 0      0, 0, 0   4, 0, 0
 49:      2, 0, 0      1, 0, 0   5, 0, 0
 50:      2, 0, 0      0, 1, 0   4, 1, 0
 51:      2, 0, 0      1, 1, 0   5, 1, 0
 52:      2, 0, 0      0, 2, 0   4, 2, 0
 53:      2, 0, 0      1, 2, 0   5, 2, 0
 54:      2, 0, 0      0, 0, 1   4, 0, 1
 55:      2, 0, 0      1, 0, 1   5, 0, 1
 56:      2, 0, 0      0, 1, 1   4, 1, 1
 57:      2, 0, 0      1, 1, 1   5, 1, 1
 58:      2, 0, 0      0, 2, 1   4, 2, 1
 59:      2, 0, 0      1, 2, 1   5, 2, 1
 60:      2, 0, 0      0, 0, 2   4, 0, 2
 61:      2, 0, 0      1, 0, 2   5, 0, 2
 62:      2, 0, 0      0, 1, 2   4, 1, 2
 63:      2, 0, 0      1, 1, 2   5, 1, 2
 64:      2, 0, 0      0, 2, 2   4, 2, 2
 65:      2, 0, 0      1, 2, 2   5, 2, 2
 66:      2, 0, 0      0, 0, 3   4, 0, 3
 67:      2, 0, 0      1, 0, 3   5, 0, 3
 68:      2, 0, 0      0, 1, 3   4, 1, 3
 69:      2, 0, 0      1, 1, 3   5, 1, 3
 70:      2, 0, 0      0, 2, 3   4, 2, 3
 71:      2, 0, 0      1, 2, 3   5, 2, 3
---------------------------------------
global                 local     global   dispatch: 3
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
 72:      3, 0, 0      0, 0, 0   6, 0, 0
 73:      3, 0, 0      1, 0, 0   7, 0, 0
 74:      3, 0, 0      0, 1, 0   6, 1, 0
 75:      3, 0, 0      1, 1, 0   7, 1, 0
 76:      3, 0, 0      0, 2, 0   6, 2, 0
 77:      3, 0, 0      1, 2, 0   7, 2, 0
 78:      3, 0, 0      0, 0, 1   6, 0, 1
 79:      3, 0, 0      1, 0, 1   7, 0, 1
 80:      3, 0, 0      0, 1, 1   6, 1, 1
 81:      3, 0, 0      1, 1, 1   7, 1, 1
 82:      3, 0, 0      0, 2, 1   6, 2, 1
 83:      3, 0, 0      1, 2, 1   7, 2, 1
 84:      3, 0, 0      0, 0, 2   6, 0, 2
 85:      3, 0, 0      1, 0, 2   7, 0, 2
 86:      3, 0, 0      0, 1, 2   6, 1, 2
 87:      3, 0, 0      1, 1, 2   7, 1, 2
 88:      3, 0, 0      0, 2, 2   6, 2, 2
 89:      3, 0, 0      1, 2, 2   7, 2, 2
 90:      3, 0, 0      0, 0, 3   6, 0, 3
 91:      3, 0, 0      1, 0, 3   7, 0, 3
 92:      3, 0, 0      0, 1, 3   6, 1, 3
 93:      3, 0, 0      1, 1, 3   7, 1, 3
 94:      3, 0, 0      0, 2, 3   6, 2, 3
 95:      3, 0, 0      1, 2, 3   7, 2, 3
---------------------------------------
global                 local     global   dispatch: 4
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
 96:      0, 1, 0      0, 0, 0   0, 3, 0
 97:      0, 1, 0      1, 0, 0   1, 3, 0
 98:      0, 1, 0      0, 1, 0   0, 4, 0
 99:      0, 1, 0      1, 1, 0   1, 4, 0
100:      0, 1, 0      0, 2, 0   0, 5, 0
101:      0, 1, 0      1, 2, 0   1, 5, 0
102:      0, 1, 0      0, 0, 1   0, 3, 1
103:      0, 1, 0      1, 0, 1   1, 3, 1
104:      0, 1, 0      0, 1, 1   0, 4, 1
105:      0, 1, 0      1, 1, 1   1, 4, 1
106:      0, 1, 0      0, 2, 1   0, 5, 1
107:      0, 1, 0      1, 2, 1   1, 5, 1
108:      0, 1, 0      0, 0, 2   0, 3, 2
109:      0, 1, 0      1, 0, 2   1, 3, 2
110:      0, 1, 0      0, 1, 2   0, 4, 2
111:      0, 1, 0      1, 1, 2   1, 4, 2
112:      0, 1, 0      0, 2, 2   0, 5, 2
113:      0, 1, 0      1, 2, 2   1, 5, 2
114:      0, 1, 0      0, 0, 3   0, 3, 3
115:      0, 1, 0      1, 0, 3   1, 3, 3
116:      0, 1, 0      0, 1, 3   0, 4, 3
117:      0, 1, 0      1, 1, 3   1, 4, 3
118:      0, 1, 0      0, 2, 3   0, 5, 3
119:      0, 1, 0      1, 2, 3   1, 5, 3
---------------------------------------
global                 local     global   dispatch: 5
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
120:      1, 1, 0      0, 0, 0   2, 3, 0
121:      1, 1, 0      1, 0, 0   3, 3, 0
122:      1, 1, 0      0, 1, 0   2, 4, 0
123:      1, 1, 0      1, 1, 0   3, 4, 0
124:      1, 1, 0      0, 2, 0   2, 5, 0
125:      1, 1, 0      1, 2, 0   3, 5, 0
126:      1, 1, 0      0, 0, 1   2, 3, 1
127:      1, 1, 0      1, 0, 1   3, 3, 1
128:      1, 1, 0      0, 1, 1   2, 4, 1
129:      1, 1, 0      1, 1, 1   3, 4, 1
130:      1, 1, 0      0, 2, 1   2, 5, 1
131:      1, 1, 0      1, 2, 1   3, 5, 1
132:      1, 1, 0      0, 0, 2   2, 3, 2
133:      1, 1, 0      1, 0, 2   3, 3, 2
134:      1, 1, 0      0, 1, 2   2, 4, 2
135:      1, 1, 0      1, 1, 2   3, 4, 2
136:      1, 1, 0      0, 2, 2   2, 5, 2
137:      1, 1, 0      1, 2, 2   3, 5, 2
138:      1, 1, 0      0, 0, 3   2, 3, 3
139:      1, 1, 0      1, 0, 3   3, 3, 3
140:      1, 1, 0      0, 1, 3   2, 4, 3
141:      1, 1, 0      1, 1, 3   3, 4, 3
142:      1, 1, 0      0, 2, 3   2, 5, 3
143:      1, 1, 0      1, 2, 3   3, 5, 3
---------------------------------------
global                 local     global   dispatch: 6
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
144:      2, 1, 0      0, 0, 0   4, 3, 0
145:      2, 1, 0      1, 0, 0   5, 3, 0
146:      2, 1, 0      0, 1, 0   4, 4, 0
147:      2, 1, 0      1, 1, 0   5, 4, 0
148:      2, 1, 0      0, 2, 0   4, 5, 0
149:      2, 1, 0      1, 2, 0   5, 5, 0
150:      2, 1, 0      0, 0, 1   4, 3, 1
151:      2, 1, 0      1, 0, 1   5, 3, 1
152:      2, 1, 0      0, 1, 1   4, 4, 1
153:      2, 1, 0      1, 1, 1   5, 4, 1
154:      2, 1, 0      0, 2, 1   4, 5, 1
155:      2, 1, 0      1, 2, 1   5, 5, 1
156:      2, 1, 0      0, 0, 2   4, 3, 2
157:      2, 1, 0      1, 0, 2   5, 3, 2
158:      2, 1, 0      0, 1, 2   4, 4, 2
159:      2, 1, 0      1, 1, 2   5, 4, 2
160:      2, 1, 0      0, 2, 2   4, 5, 2
161:      2, 1, 0      1, 2, 2   5, 5, 2
162:      2, 1, 0      0, 0, 3   4, 3, 3
163:      2, 1, 0      1, 0, 3   5, 3, 3
164:      2, 1, 0      0, 1, 3   4, 4, 3
165:      2, 1, 0      1, 1, 3   5, 4, 3
166:      2, 1, 0      0, 2, 3   4, 5, 3
167:      2, 1, 0      1, 2, 3   5, 5, 3
---------------------------------------
global                 local     global   dispatch: 7
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
168:      3, 1, 0      0, 0, 0   6, 3, 0
169:      3, 1, 0      1, 0, 0   7, 3, 0
170:      3, 1, 0      0, 1, 0   6, 4, 0
171:      3, 1, 0      1, 1, 0   7, 4, 0
172:      3, 1, 0      0, 2, 0   6, 5, 0
173:      3, 1, 0      1, 2, 0   7, 5, 0
174:      3, 1, 0      0, 0, 1   6, 3, 1
175:      3, 1, 0      1, 0, 1   7, 3, 1
176:      3, 1, 0      0, 1, 1   6, 4, 1
177:      3, 1, 0      1, 1, 1   7, 4, 1
178:      3, 1, 0      0, 2, 1   6, 5, 1
179:      3, 1, 0      1, 2, 1   7, 5, 1
180:      3, 1, 0      0, 0, 2   6, 3, 2
181:      3, 1, 0      1, 0, 2   7, 3, 2
182:      3, 1, 0      0, 1, 2   6, 4, 2
183:      3, 1, 0      1, 1, 2   7, 4, 2
184:      3, 1, 0      0, 2, 2   6, 5, 2
185:      3, 1, 0      1, 2, 2   7, 5, 2
186:      3, 1, 0      0, 0, 3   6, 3, 3
187:      3, 1, 0      1, 0, 3   7, 3, 3
188:      3, 1, 0      0, 1, 3   6, 4, 3
189:      3, 1, 0      1, 1, 3   7, 4, 3
190:      3, 1, 0      0, 2, 3   6, 5, 3
191:      3, 1, 0      1, 2, 3   7, 5, 3
---------------------------------------
global                 local     global   dispatch: 8
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
192:      0, 2, 0      0, 0, 0   0, 6, 0
193:      0, 2, 0      1, 0, 0   1, 6, 0
194:      0, 2, 0      0, 1, 0   0, 7, 0
195:      0, 2, 0      1, 1, 0   1, 7, 0
196:      0, 2, 0      0, 2, 0   0, 8, 0
197:      0, 2, 0      1, 2, 0   1, 8, 0
198:      0, 2, 0      0, 0, 1   0, 6, 1
199:      0, 2, 0      1, 0, 1   1, 6, 1
200:      0, 2, 0      0, 1, 1   0, 7, 1
201:      0, 2, 0      1, 1, 1   1, 7, 1
202:      0, 2, 0      0, 2, 1   0, 8, 1
203:      0, 2, 0      1, 2, 1   1, 8, 1
204:      0, 2, 0      0, 0, 2   0, 6, 2
205:      0, 2, 0      1, 0, 2   1, 6, 2
206:      0, 2, 0      0, 1, 2   0, 7, 2
207:      0, 2, 0      1, 1, 2   1, 7, 2
208:      0, 2, 0      0, 2, 2   0, 8, 2
209:      0, 2, 0      1, 2, 2   1, 8, 2
210:      0, 2, 0      0, 0, 3   0, 6, 3
211:      0, 2, 0      1, 0, 3   1, 6, 3
212:      0, 2, 0      0, 1, 3   0, 7, 3
213:      0, 2, 0      1, 1, 3   1, 7, 3
214:      0, 2, 0      0, 2, 3   0, 8, 3
215:      0, 2, 0      1, 2, 3   1, 8, 3
---------------------------------------
global                 local     global   dispatch: 9
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
216:      1, 2, 0      0, 0, 0   2, 6, 0
217:      1, 2, 0      1, 0, 0   3, 6, 0
218:      1, 2, 0      0, 1, 0   2, 7, 0
219:      1, 2, 0      1, 1, 0   3, 7, 0
220:      1, 2, 0      0, 2, 0   2, 8, 0
221:      1, 2, 0      1, 2, 0   3, 8, 0
222:      1, 2, 0      0, 0, 1   2, 6, 1
223:      1, 2, 0      1, 0, 1   3, 6, 1
224:      1, 2, 0      0, 1, 1   2, 7, 1
225:      1, 2, 0      1, 1, 1   3, 7, 1
226:      1, 2, 0      0, 2, 1   2, 8, 1
227:      1, 2, 0      1, 2, 1   3, 8, 1
228:      1, 2, 0      0, 0, 2   2, 6, 2
229:      1, 2, 0      1, 0, 2   3, 6, 2
230:      1, 2, 0      0, 1, 2   2, 7, 2
231:      1, 2, 0      1, 1, 2   3, 7, 2
232:      1, 2, 0      0, 2, 2   2, 8, 2
233:      1, 2, 0      1, 2, 2   3, 8, 2
234:      1, 2, 0      0, 0, 3   2, 6, 3
235:      1, 2, 0      1, 0, 3   3, 6, 3
236:      1, 2, 0      0, 1, 3   2, 7, 3
237:      1, 2, 0      1, 1, 3   3, 7, 3
238:      1, 2, 0      0, 2, 3   2, 8, 3
239:      1, 2, 0      1, 2, 3   3, 8, 3
---------------------------------------
global                 local     global   dispatch: 10
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
240:      2, 2, 0      0, 0, 0   4, 6, 0
241:      2, 2, 0      1, 0, 0   5, 6, 0
242:      2, 2, 0      0, 1, 0   4, 7, 0
243:      2, 2, 0      1, 1, 0   5, 7, 0
244:      2, 2, 0      0, 2, 0   4, 8, 0
245:      2, 2, 0      1, 2, 0   5, 8, 0
246:      2, 2, 0      0, 0, 1   4, 6, 1
247:      2, 2, 0      1, 0, 1   5, 6, 1
248:      2, 2, 0      0, 1, 1   4, 7, 1
249:      2, 2, 0      1, 1, 1   5, 7, 1
250:      2, 2, 0      0, 2, 1   4, 8, 1
251:      2, 2, 0      1, 2, 1   5, 8, 1
252:      2, 2, 0      0, 0, 2   4, 6, 2
253:      2, 2, 0      1, 0, 2   5, 6, 2
254:      2, 2, 0      0, 1, 2   4, 7, 2
255:      2, 2, 0      1, 1, 2   5, 7, 2
256:      2, 2, 0      0, 2, 2   4, 8, 2
257:      2, 2, 0      1, 2, 2   5, 8, 2
258:      2, 2, 0      0, 0, 3   4, 6, 3
259:      2, 2, 0      1, 0, 3   5, 6, 3
260:      2, 2, 0      0, 1, 3   4, 7, 3
261:      2, 2, 0      1, 1, 3   5, 7, 3
262:      2, 2, 0      0, 2, 3   4, 8, 3
263:      2, 2, 0      1, 2, 3   5, 8, 3
---------------------------------------
global                 local     global   dispatch: 11
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
264:      3, 2, 0      0, 0, 0   6, 6, 0
265:      3, 2, 0      1, 0, 0   7, 6, 0
266:      3, 2, 0      0, 1, 0   6, 7, 0
267:      3, 2, 0      1, 1, 0   7, 7, 0
268:      3, 2, 0      0, 2, 0   6, 8, 0
269:      3, 2, 0      1, 2, 0   7, 8, 0
270:      3, 2, 0      0, 0, 1   6, 6, 1
271:      3, 2, 0      1, 0, 1   7, 6, 1
272:      3, 2, 0      0, 1, 1   6, 7, 1
273:      3, 2, 0      1, 1, 1   7, 7, 1
274:      3, 2, 0      0, 2, 1   6, 8, 1
275:      3, 2, 0      1, 2, 1   7, 8, 1
276:      3, 2, 0      0, 0, 2   6, 6, 2
277:      3, 2, 0      1, 0, 2   7, 6, 2
278:      3, 2, 0      0, 1, 2   6, 7, 2
279:      3, 2, 0      1, 1, 2   7, 7, 2
280:      3, 2, 0      0, 2, 2   6, 8, 2
281:      3, 2, 0      1, 2, 2   7, 8, 2
282:      3, 2, 0      0, 0, 3   6, 6, 3
283:      3, 2, 0      1, 0, 3   7, 6, 3
284:      3, 2, 0      0, 1, 3   6, 7, 3
285:      3, 2, 0      1, 1, 3   7, 7, 3
286:      3, 2, 0      0, 2, 3   6, 8, 3
287:      3, 2, 0      1, 2, 3   7, 8, 3
---------------------------------------
global                 local     global   dispatch: 12
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
288:      0, 0, 1      0, 0, 0   0, 0, 4
289:      0, 0, 1      1, 0, 0   1, 0, 4
290:      0, 0, 1      0, 1, 0   0, 1, 4
291:      0, 0, 1      1, 1, 0   1, 1, 4
292:      0, 0, 1      0, 2, 0   0, 2, 4
293:      0, 0, 1      1, 2, 0   1, 2, 4
294:      0, 0, 1      0, 0, 1   0, 0, 5
295:      0, 0, 1      1, 0, 1   1, 0, 5
296:      0, 0, 1      0, 1, 1   0, 1, 5
297:      0, 0, 1      1, 1, 1   1, 1, 5
298:      0, 0, 1      0, 2, 1   0, 2, 5
299:      0, 0, 1      1, 2, 1   1, 2, 5
300:      0, 0, 1      0, 0, 2   0, 0, 6
301:      0, 0, 1      1, 0, 2   1, 0, 6
302:      0, 0, 1      0, 1, 2   0, 1, 6
303:      0, 0, 1      1, 1, 2   1, 1, 6
304:      0, 0, 1      0, 2, 2   0, 2, 6
305:      0, 0, 1      1, 2, 2   1, 2, 6
306:      0, 0, 1      0, 0, 3   0, 0, 7
307:      0, 0, 1      1, 0, 3   1, 0, 7
308:      0, 0, 1      0, 1, 3   0, 1, 7
309:      0, 0, 1      1, 1, 3   1, 1, 7
310:      0, 0, 1      0, 2, 3   0, 2, 7
311:      0, 0, 1      1, 2, 3   1, 2, 7
---------------------------------------
global                 local     global   dispatch: 13
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
312:      1, 0, 1      0, 0, 0   2, 0, 4
313:      1, 0, 1      1, 0, 0   3, 0, 4
314:      1, 0, 1      0, 1, 0   2, 1, 4
315:      1, 0, 1      1, 1, 0   3, 1, 4
316:      1, 0, 1      0, 2, 0   2, 2, 4
317:      1, 0, 1      1, 2, 0   3, 2, 4
318:      1, 0, 1      0, 0, 1   2, 0, 5
319:      1, 0, 1      1, 0, 1   3, 0, 5
320:      1, 0, 1      0, 1, 1   2, 1, 5
321:      1, 0, 1      1, 1, 1   3, 1, 5
322:      1, 0, 1      0, 2, 1   2, 2, 5
323:      1, 0, 1      1, 2, 1   3, 2, 5
324:      1, 0, 1      0, 0, 2   2, 0, 6
325:      1, 0, 1      1, 0, 2   3, 0, 6
326:      1, 0, 1      0, 1, 2   2, 1, 6
327:      1, 0, 1      1, 1, 2   3, 1, 6
328:      1, 0, 1      0, 2, 2   2, 2, 6
329:      1, 0, 1      1, 2, 2   3, 2, 6
330:      1, 0, 1      0, 0, 3   2, 0, 7
331:      1, 0, 1      1, 0, 3   3, 0, 7
332:      1, 0, 1      0, 1, 3   2, 1, 7
333:      1, 0, 1      1, 1, 3   3, 1, 7
334:      1, 0, 1      0, 2, 3   2, 2, 7
335:      1, 0, 1      1, 2, 3   3, 2, 7
---------------------------------------
global                 local     global   dispatch: 14
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
336:      2, 0, 1      0, 0, 0   4, 0, 4
337:      2, 0, 1      1, 0, 0   5, 0, 4
338:      2, 0, 1      0, 1, 0   4, 1, 4
339:      2, 0, 1      1, 1, 0   5, 1, 4
340:      2, 0, 1      0, 2, 0   4, 2, 4
341:      2, 0, 1      1, 2, 0   5, 2, 4
342:      2, 0, 1      0, 0, 1   4, 0, 5
343:      2, 0, 1      1, 0, 1   5, 0, 5
344:      2, 0, 1      0, 1, 1   4, 1, 5
345:      2, 0, 1      1, 1, 1   5, 1, 5
346:      2, 0, 1      0, 2, 1   4, 2, 5
347:      2, 0, 1      1, 2, 1   5, 2, 5
348:      2, 0, 1      0, 0, 2   4, 0, 6
349:      2, 0, 1      1, 0, 2   5, 0, 6
350:      2, 0, 1      0, 1, 2   4, 1, 6
351:      2, 0, 1      1, 1, 2   5, 1, 6
352:      2, 0, 1      0, 2, 2   4, 2, 6
353:      2, 0, 1      1, 2, 2   5, 2, 6
354:      2, 0, 1      0, 0, 3   4, 0, 7
355:      2, 0, 1      1, 0, 3   5, 0, 7
356:      2, 0, 1      0, 1, 3   4, 1, 7
357:      2, 0, 1      1, 1, 3   5, 1, 7
358:      2, 0, 1      0, 2, 3   4, 2, 7
359:      2, 0, 1      1, 2, 3   5, 2, 7
---------------------------------------
global                 local     global   dispatch: 15
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
360:      3, 0, 1      0, 0, 0   6, 0, 4
361:      3, 0, 1      1, 0, 0   7, 0, 4
362:      3, 0, 1      0, 1, 0   6, 1, 4
363:      3, 0, 1      1, 1, 0   7, 1, 4
364:      3, 0, 1      0, 2, 0   6, 2, 4
365:      3, 0, 1      1, 2, 0   7, 2, 4
366:      3, 0, 1      0, 0, 1   6, 0, 5
367:      3, 0, 1      1, 0, 1   7, 0, 5
368:      3, 0, 1      0, 1, 1   6, 1, 5
369:      3, 0, 1      1, 1, 1   7, 1, 5
370:      3, 0, 1      0, 2, 1   6, 2, 5
371:      3, 0, 1      1, 2, 1   7, 2, 5
372:      3, 0, 1      0, 0, 2   6, 0, 6
373:      3, 0, 1      1, 0, 2   7, 0, 6
374:      3, 0, 1      0, 1, 2   6, 1, 6
375:      3, 0, 1      1, 1, 2   7, 1, 6
376:      3, 0, 1      0, 2, 2   6, 2, 6
377:      3, 0, 1      1, 2, 2   7, 2, 6
378:      3, 0, 1      0, 0, 3   6, 0, 7
379:      3, 0, 1      1, 0, 3   7, 0, 7
380:      3, 0, 1      0, 1, 3   6, 1, 7
381:      3, 0, 1      1, 1, 3   7, 1, 7
382:      3, 0, 1      0, 2, 3   6, 2, 7
383:      3, 0, 1      1, 2, 3   7, 2, 7
---------------------------------------
global                 local     global   dispatch: 16
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
384:      0, 1, 1      0, 0, 0   0, 3, 4
385:      0, 1, 1      1, 0, 0   1, 3, 4
386:      0, 1, 1      0, 1, 0   0, 4, 4
387:      0, 1, 1      1, 1, 0   1, 4, 4
388:      0, 1, 1      0, 2, 0   0, 5, 4
389:      0, 1, 1      1, 2, 0   1, 5, 4
390:      0, 1, 1      0, 0, 1   0, 3, 5
391:      0, 1, 1      1, 0, 1   1, 3, 5
392:      0, 1, 1      0, 1, 1   0, 4, 5
393:      0, 1, 1      1, 1, 1   1, 4, 5
394:      0, 1, 1      0, 2, 1   0, 5, 5
395:      0, 1, 1      1, 2, 1   1, 5, 5
396:      0, 1, 1      0, 0, 2   0, 3, 6
397:      0, 1, 1      1, 0, 2   1, 3, 6
398:      0, 1, 1      0, 1, 2   0, 4, 6
399:      0, 1, 1      1, 1, 2   1, 4, 6
400:      0, 1, 1      0, 2, 2   0, 5, 6
401:      0, 1, 1      1, 2, 2   1, 5, 6
402:      0, 1, 1      0, 0, 3   0, 3, 7
403:      0, 1, 1      1, 0, 3   1, 3, 7
404:      0, 1, 1      0, 1, 3   0, 4, 7
405:      0, 1, 1      1, 1, 3   1, 4, 7
406:      0, 1, 1      0, 2, 3   0, 5, 7
407:      0, 1, 1      1, 2, 3   1, 5, 7
---------------------------------------
global                 local     global   dispatch: 17
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
408:      1, 1, 1      0, 0, 0   2, 3, 4
409:      1, 1, 1      1, 0, 0   3, 3, 4
410:      1, 1, 1      0, 1, 0   2, 4, 4
411:      1, 1, 1      1, 1, 0   3, 4, 4
412:      1, 1, 1      0, 2, 0   2, 5, 4
413:      1, 1, 1      1, 2, 0   3, 5, 4
414:      1, 1, 1      0, 0, 1   2, 3, 5
415:      1, 1, 1      1, 0, 1   3, 3, 5
416:      1, 1, 1      0, 1, 1   2, 4, 5
417:      1, 1, 1      1, 1, 1   3, 4, 5
418:      1, 1, 1      0, 2, 1   2, 5, 5
419:      1, 1, 1      1, 2, 1   3, 5, 5
420:      1, 1, 1      0, 0, 2   2, 3, 6
421:      1, 1, 1      1, 0, 2   3, 3, 6
422:      1, 1, 1      0, 1, 2   2, 4, 6
423:      1, 1, 1      1, 1, 2   3, 4, 6
424:      1, 1, 1      0, 2, 2   2, 5, 6
425:      1, 1, 1      1, 2, 2   3, 5, 6
426:      1, 1, 1      0, 0, 3   2, 3, 7
427:      1, 1, 1      1, 0, 3   3, 3, 7
428:      1, 1, 1      0, 1, 3   2, 4, 7
429:      1, 1, 1      1, 1, 3   3, 4, 7
430:      1, 1, 1      0, 2, 3   2, 5, 7
431:      1, 1, 1      1, 2, 3   3, 5, 7
---------------------------------------
global                 local     global   dispatch: 18
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
432:      2, 1, 1      0, 0, 0   4, 3, 4
433:      2, 1, 1      1, 0, 0   5, 3, 4
434:      2, 1, 1      0, 1, 0   4, 4, 4
435:      2, 1, 1      1, 1, 0   5, 4, 4
436:      2, 1, 1      0, 2, 0   4, 5, 4
437:      2, 1, 1      1, 2, 0   5, 5, 4
438:      2, 1, 1      0, 0, 1   4, 3, 5
439:      2, 1, 1      1, 0, 1   5, 3, 5
440:      2, 1, 1      0, 1, 1   4, 4, 5
441:      2, 1, 1      1, 1, 1   5, 4, 5
442:      2, 1, 1      0, 2, 1   4, 5, 5
443:      2, 1, 1      1, 2, 1   5, 5, 5
444:      2, 1, 1      0, 0, 2   4, 3, 6
445:      2, 1, 1      1, 0, 2   5, 3, 6
446:      2, 1, 1      0, 1, 2   4, 4, 6
447:      2, 1, 1      1, 1, 2   5, 4, 6
448:      2, 1, 1      0, 2, 2   4, 5, 6
449:      2, 1, 1      1, 2, 2   5, 5, 6
450:      2, 1, 1      0, 0, 3   4, 3, 7
451:      2, 1, 1      1, 0, 3   5, 3, 7
452:      2, 1, 1      0, 1, 3   4, 4, 7
453:      2, 1, 1      1, 1, 3   5, 4, 7
454:      2, 1, 1      0, 2, 3   4, 5, 7
455:      2, 1, 1      1, 2, 3   5, 5, 7
---------------------------------------
global                 local     global   dispatch: 19
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
456:      3, 1, 1      0, 0, 0   6, 3, 4
457:      3, 1, 1      1, 0, 0   7, 3, 4
458:      3, 1, 1      0, 1, 0   6, 4, 4
459:      3, 1, 1      1, 1, 0   7, 4, 4
460:      3, 1, 1      0, 2, 0   6, 5, 4
461:      3, 1, 1      1, 2, 0   7, 5, 4
462:      3, 1, 1      0, 0, 1   6, 3, 5
463:      3, 1, 1      1, 0, 1   7, 3, 5
464:      3, 1, 1      0, 1, 1   6, 4, 5
465:      3, 1, 1      1, 1, 1   7, 4, 5
466:      3, 1, 1      0, 2, 1   6, 5, 5
467:      3, 1, 1      1, 2, 1   7, 5, 5
468:      3, 1, 1      0, 0, 2   6, 3, 6
469:      3, 1, 1      1, 0, 2   7, 3, 6
470:      3, 1, 1      0, 1, 2   6, 4, 6
471:      3, 1, 1      1, 1, 2   7, 4, 6
472:      3, 1, 1      0, 2, 2   6, 5, 6
473:      3, 1, 1      1, 2, 2   7, 5, 6
474:      3, 1, 1      0, 0, 3   6, 3, 7
475:      3, 1, 1      1, 0, 3   7, 3, 7
476:      3, 1, 1      0, 1, 3   6, 4, 7
477:      3, 1, 1      1, 1, 3   7, 4, 7
478:      3, 1, 1      0, 2, 3   6, 5, 7
479:      3, 1, 1      1, 2, 3   7, 5, 7
---------------------------------------
global                 local     global   dispatch: 20
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
480:      0, 2, 1      0, 0, 0   0, 6, 4
481:      0, 2, 1      1, 0, 0   1, 6, 4
482:      0, 2, 1      0, 1, 0   0, 7, 4
483:      0, 2, 1      1, 1, 0   1, 7, 4
484:      0, 2, 1      0, 2, 0   0, 8, 4
485:      0, 2, 1      1, 2, 0   1, 8, 4
486:      0, 2, 1      0, 0, 1   0, 6, 5
487:      0, 2, 1      1, 0, 1   1, 6, 5
488:      0, 2, 1      0, 1, 1   0, 7, 5
489:      0, 2, 1      1, 1, 1   1, 7, 5
490:      0, 2, 1      0, 2, 1   0, 8, 5
491:      0, 2, 1      1, 2, 1   1, 8, 5
492:      0, 2, 1      0, 0, 2   0, 6, 6
493:      0, 2, 1      1, 0, 2   1, 6, 6
494:      0, 2, 1      0, 1, 2   0, 7, 6
495:      0, 2, 1      1, 1, 2   1, 7, 6
496:      0, 2, 1      0, 2, 2   0, 8, 6
497:      0, 2, 1      1, 2, 2   1, 8, 6
498:      0, 2, 1      0, 0, 3   0, 6, 7
499:      0, 2, 1      1, 0, 3   1, 6, 7
500:      0, 2, 1      0, 1, 3   0, 7, 7
501:      0, 2, 1      1, 1, 3   1, 7, 7
502:      0, 2, 1      0, 2, 3   0, 8, 7
503:      0, 2, 1      1, 2, 3   1, 8, 7
---------------------------------------
global                 local     global   dispatch: 21
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
504:      1, 2, 1      0, 0, 0   2, 6, 4
505:      1, 2, 1      1, 0, 0   3, 6, 4
506:      1, 2, 1      0, 1, 0   2, 7, 4
507:      1, 2, 1      1, 1, 0   3, 7, 4
508:      1, 2, 1      0, 2, 0   2, 8, 4
509:      1, 2, 1      1, 2, 0   3, 8, 4
510:      1, 2, 1      0, 0, 1   2, 6, 5
511:      1, 2, 1      1, 0, 1   3, 6, 5
512:      1, 2, 1      0, 1, 1   2, 7, 5
513:      1, 2, 1      1, 1, 1   3, 7, 5
514:      1, 2, 1      0, 2, 1   2, 8, 5
515:      1, 2, 1      1, 2, 1   3, 8, 5
516:      1, 2, 1      0, 0, 2   2, 6, 6
517:      1, 2, 1      1, 0, 2   3, 6, 6
518:      1, 2, 1      0, 1, 2   2, 7, 6
519:      1, 2, 1      1, 1, 2   3, 7, 6
520:      1, 2, 1      0, 2, 2   2, 8, 6
521:      1, 2, 1      1, 2, 2   3, 8, 6
522:      1, 2, 1      0, 0, 3   2, 6, 7
523:      1, 2, 1      1, 0, 3   3, 6, 7
524:      1, 2, 1      0, 1, 3   2, 7, 7
525:      1, 2, 1      1, 1, 3   3, 7, 7
526:      1, 2, 1      0, 2, 3   2, 8, 7
527:      1, 2, 1      1, 2, 3   3, 8, 7
---------------------------------------
global                 local     global   dispatch: 22
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
528:      2, 2, 1      0, 0, 0   4, 6, 4
529:      2, 2, 1      1, 0, 0   5, 6, 4
530:      2, 2, 1      0, 1, 0   4, 7, 4
531:      2, 2, 1      1, 1, 0   5, 7, 4
532:      2, 2, 1      0, 2, 0   4, 8, 4
533:      2, 2, 1      1, 2, 0   5, 8, 4
534:      2, 2, 1      0, 0, 1   4, 6, 5
535:      2, 2, 1      1, 0, 1   5, 6, 5
536:      2, 2, 1      0, 1, 1   4, 7, 5
537:      2, 2, 1      1, 1, 1   5, 7, 5
538:      2, 2, 1      0, 2, 1   4, 8, 5
539:      2, 2, 1      1, 2, 1   5, 8, 5
540:      2, 2, 1      0, 0, 2   4, 6, 6
541:      2, 2, 1      1, 0, 2   5, 6, 6
542:      2, 2, 1      0, 1, 2   4, 7, 6
543:      2, 2, 1      1, 1, 2   5, 7, 6
544:      2, 2, 1      0, 2, 2   4, 8, 6
545:      2, 2, 1      1, 2, 2   5, 8, 6
546:      2, 2, 1      0, 0, 3   4, 6, 7
547:      2, 2, 1      1, 0, 3   5, 6, 7
548:      2, 2, 1      0, 1, 3   4, 7, 7
549:      2, 2, 1      1, 1, 3   5, 7, 7
550:      2, 2, 1      0, 2, 3   4, 8, 7
551:      2, 2, 1      1, 2, 3   5, 8, 7
---------------------------------------
global                 local     global   dispatch: 23
invoc.    workgroup    invoc.    invoc.
index     id           id        id
---------------------------------------
552:      3, 2, 1      0, 0, 0   6, 6, 4
553:      3, 2, 1      1, 0, 0   7, 6, 4
554:      3, 2, 1      0, 1, 0   6, 7, 4
555:      3, 2, 1      1, 1, 0   7, 7, 4
556:      3, 2, 1      0, 2, 0   6, 8, 4
557:      3, 2, 1      1, 2, 0   7, 8, 4
558:      3, 2, 1      0, 0, 1   6, 6, 5
559:      3, 2, 1      1, 0, 1   7, 6, 5
560:      3, 2, 1      0, 1, 1   6, 7, 5
561:      3, 2, 1      1, 1, 1   7, 7, 5
562:      3, 2, 1      0, 2, 1   6, 8, 5
563:      3, 2, 1      1, 2, 1   7, 8, 5
564:      3, 2, 1      0, 0, 2   6, 6, 6
565:      3, 2, 1      1, 0, 2   7, 6, 6
566:      3, 2, 1      0, 1, 2   6, 7, 6
567:      3, 2, 1      1, 1, 2   7, 7, 6
568:      3, 2, 1      0, 2, 2   6, 8, 6
569:      3, 2, 1      1, 2, 2   7, 8, 6
570:      3, 2, 1      0, 0, 3   6, 6, 7
571:      3, 2, 1      1, 0, 3   7, 6, 7
572:      3, 2, 1      0, 1, 3   6, 7, 7
573:      3, 2, 1      1, 1, 3   7, 7, 7
574:      3, 2, 1      0, 2, 3   6, 8, 7
575:      3, 2, 1      1, 2, 3   7, 8, 7

在对 pass.dispatchWorkgroups 的调用中,这些内置属性(builtins)通常是每个线程唯一的动态输入。因此,要高效地设计计算着色器函数,就需要明确如何利用这些以 ..._id 结尾的内置属性作为输入(global_invocation_id、local_invocation_id、workgroup_id),来实现我们期望的功能。

WorkGroup尺寸

WorkGroup应该设置成什么尺寸?这个问题经常被提出:为什么不总是使用 @workgroup_size(1, 1, 1),这样通过 pass.dispatchWorkgroups 的参数就可以更简单地决定运行多少次迭代。

原因是工作组内的多个线程比单个线程的调度更快。

首先,工作组中的线程通常以同步方式运行(lockstep),因此运行 16 个线程的速度与运行 1 个线程一样快。

WebGPU 的默认限制如下:

maxComputeInvocationsPerWorkgroup: 256
maxComputeWorkgroupSizeX: 256
maxComputeWorkgroupSizeY: 256
maxComputeWorkgroupSizeZ: 64

如你所见,第一个限制 maxComputeInvocationsPerWorkgroup 表示 @workgroup_size 的三个参数相乘后不能超过 256。换句话说:

@workgroup_size(256, 1, 1)   // 合法(256 * 1 * 1 = 256)
@workgroup_size(128, 2, 1)   // 合法(128 * 2 * 1 = 256)
@workgroup_size(16, 16, 1)   // 合法(16 * 16 * 1 = 256)
@workgroup_size(16, 16, 2)   // 非法(16 * 16 * 2 = 512 > 256)

不过最佳尺寸依赖GPU能力,WebGPU 无法提供这些信息。WebGPU 的一般建议是选择工作组大小为 64,除非有特别的理由选择其他尺寸。大多数 GPU 可以高效地以同步方式运行 64 个线程。如果选择了一个更大的数字,而 GPU 无法以快速路径处理它,它会选择较慢的路径;相反,如果你选择的数字低于 GPU 的能力,则可能无法获得最大性能。

快速路径(Fast Path):

GPU硬件通常针对某些特定的工作组大小进行了优化(例如,NVIDIA的warp大小为32,AMD的wavefront大小为64)。如果工作组大小是这些优化值的倍数(例如64、128),GPU可以高效地并行执行线程,充分利用硬件资源(如寄存器、共享内存、计算单元)。

慢速路径(Slow Path):

如果工作组大小超过了GPU的最优处理能力(例如,你选择了256线程,但GPU的每个计算单元只能高效处理128线程),GPU可能无法将线程均匀分配到硬件资源上。此时,GPU会采取“降级策略”,例如:
分拆工作组:将大工作组拆分成多个小工作组,导致额外的调度开销。
资源溢出:线程所需的寄存器或共享内存超过硬件限制,被迫使用更慢的全局内存(寄存器溢出到内存)。
并行度降低:硬件无法同时处理所有线程,导致线程串行化或等待,性能下降。

Compute Shaders中的竞态条件

在 WebGPU 中,一个常见的错误是未处理竞态条件(race conditions)。

竞态条件是指多个线程同时运行,并且它们实际上在争夺谁先完成或谁最后完成的情况。

假设有这样一个计算着色器:

@group(0) @binding(0) var<storage, read_write> result: array<f32>;
 
@compute @workgroup_size(32) fn computeSomething(
    @builtin(local_invocation_id) local_invocation_id : vec3<u32>,
) {
  result[0] = local_invocation_id.x;
`;

如果看不懂,下面是相同的JavaScript逻辑:

const result = [];
for (let i = 0; i < 32; ++i) {
  result[0] = i;
}

在 JavaScript 的情况下,代码运行后,result[0] 的值显然是 31。但在计算着色器的情况下,所有 32 次着色器迭代是并行运行的。最终 result[0] 的值取决于哪个线程最后完成写入操作。而哪个线程会最后运行是未定义的(undefined)。

来自规范(spec)的说明:

WebGPU 不提供以下任何保证:
1.不同工作组的调用是否并发执行。
也就是说,你不能假设一次有多个工作组同时执行。

2.一旦某个工作组的调用开始执行,其他工作组是否被阻塞。
也就是说,你不能假设每次只有一个工作组执行。
在一个工作组执行期间,implementation可能会选择同时执行其他工作组,或者执行其他已排队但未被阻塞的工作。

3.某个特定工作组的调用是否在另一个工作组的调用之前开始执行。
也就是说,你不能假设工作组是按特定顺序启动的。
0 Answers