Path of Exile calculators

I recently developed a web-based Path of Exile calculator so that it is more practical to estimate the number of divine orbs required to get at least a certain set of rolls.

It can be found on its own page.

The JavaScript source code is not obfuscated so it is easier to reuse it if you want to.

The Future of WebAssembly

I was happy to read about what is planned for WebAssembly.

It is a really good thing to see what came after asm.js going so far. If we are transpiling to JS anyway, might as well compile to something which runs faster.

Lastly, I’d like to point out that the author has this blog which has some other well-made cartoons explaining web concepts.

Finding the Maximum of Four Integers

Inspired by an analysis of the source code of Rogue, I decided to find out how modern compilers encode selecting the maximum of four integers.

int max(int a, int b, int c, int d) {
  return a > b ? (a > c ? (a > d ? a : d) : (c > d ? c : d))
               : (b > c ? (b > d ? b : d) : (c > d ? c : d));

It is important to note that this is isolated code generation and a modern compiler could probably do better if more context was available.

Using Clang 5.0, I get the following (with optimizations enabled, of course).

 0:  39 f7                	cmp    edi, esi
 2:  0f 4d f7             	cmovge esi, edi
 5:  39 d6                	cmp    esi, edx
 7:  0f 4c f2             	cmovl  esi, edx
 a:  39 ce                	cmp    esi, ecx
 c:  0f 4c f1             	cmovl  esi, ecx
 f:  89 f0                	mov    eax, esi
11:  c3                   	ret

I will rename the code according to the calling convention used here.

a -> edi
b -> esi
c -> edx
d -> ecx

The result follows with some comments on what the code is doing.

cmp    a, b
cmovge b, a    # If a >= b, assign a to b.
               # Now b has max(a, b).
cmp    b, c
cmovl  b, c    # If max(a, b) < c, assign c to b.
               # Now b has max(max(a, b), c).
cmp    b, d
cmovl  b, d    # If max(max(a, b), c) < d, assign d to b.
               # Now b has max(max(max(a, b), c), d).
mov    eax, b  # Return b through eax.

I don’t think it can be done in less code than this.

As one might expect, using C++ and std::max, we get almost identical x86.

Examining Compiler Output 2

In this post I present a short comparison between the machine code generated by GCC 7.2 and Clang 5.0.0 for the evaluation of a floating point polynomial.

For both compilers, only -O2 was used, but -O3 produced the same code.

float evaluate(float a, float b) {
    return (a - b + 1.0f) * (a - b) * (a - b - 1.0f);


evaluate(float, float):
  subss   xmm0, xmm1
  movss   xmm2, DWORD PTR .LC0[rip]
  movaps  xmm1, xmm0
  addss   xmm1, xmm2
  mulss   xmm1, xmm0
  subss   xmm0, xmm2
  mulss   xmm0, xmm1
  .long 1065353216


  .long 1065353216 # float 1
  .long 3212836864 # float -1
evaluate(float, float): # @evaluate(float, float)
  subss   xmm0, xmm1
  movss   xmm1, dword ptr [rip + .LCPI0_0]
  addss   xmm1, xmm0
  mulss   xmm1, xmm0
  addss   xmm0, dword ptr [rip + .LCPI0_1]
  mulss   xmm0, xmm1

Explanation (for the Clang output)

xmm0 = a
xmm1 = b
xmm0 = a - b
xmm1 = 1.0f
xmm1 = a - b + 1.0f
xmm1 = (a - b + 1.0f) * (a - b)
xmm0 = a - b - 1.0f
xmm0 = (a - b - 1.0f) * (a - b + 1.0f) * (a - b)

GCC seems to prefer movaps over movss, even though movss is sufficient in this case. A reason for doing so is that using movaps avoid stalls from partial updates to XMM registers. Clang doesn’t generate movaps, but uses two constants and only addss for them rather than only having one and using subss to subtract one from a register.

After benchmarking these alternatives, they had roughly the same throughput.

Examining Compiler Output 1

In this post I present a short comparison between the machine code generated by Clang 5.0.0 for two different for loops.

Sum up to N

The first function sums from 1 to N and returns the result.


int sumUpTo(int n) {
  int x = 0;
  for (int i = 1; i <= n; i++) {
    x += i;
  return x;


  test  edi, edi
  jle   .LBB0_1
  lea   eax, [rdi - 1]
  lea   ecx, [rdi - 2]
  imul  rcx, rax
  shr   rcx
  lea   eax, [rcx + 2*rdi - 1]
  xor   eax, eax

As you can see, Clang used a closed formula to evaluate the expression!

What should it be

  (1 + n)(n) / 2
= (n^2 + n) / 2

What Clang does

  (n - 1)(n - 2) / 2 + (2n - 1)
= (n^2 - 3n + 2) / 2 + (4n - 2) / 2
= (n^2 + n) / 2

It is important to note that the reason why a compiler would rather evaluate this convoluted expression is that it will not overflow like the first expression would.

Sum up to N multiplying by 2


int sumTwiceUpTo(int n) {
  int x = 0;
  for (int i = 1; i <= n; i++) {
    x += 2 * i;
  return x;


  test  edi, edi
  jle   .LBB1_1
  lea   eax, [rdi - 1]
  lea   ecx, [rdi - 2]
  imul  ecx, eax
  and   ecx, -2
  lea   eax, [rcx + 4*rdi - 2]
  xor   eax, eax

What is is

  (1 + n)(n)
= (n^2 + n)

What Clang does

  ((n - 1)(n - 2) & 111...0) + 4n - 2
= (n - 1)(n - 2) + 4n - 2
= n^2 + n

Again, a closed formula!

It is interesting to note that the and is completely useless here: it is forcing the product (which is always even) into a even number. Also, it requires the multiplication result to be evaluated, so it could add measurable delay.