2019-03-17
I recently developed a web-based Path of Exile calculator so that it is more practical to estimate the number of divine orbs required to get at least a certain set of rolls.

It can be found on its own page.

The JavaScript source code is not obfuscated so it is easier to reuse it if you want to.

2018-10-23
I was happy to read about
what is planned for WebAssembly.

It is a really good thing to see what came after `asm.js`

going so far. If we
are transpiling to JS anyway, might as well compile to something which runs
faster.

Lastly, I’d like to point out that the author has
this blog
which has some other well-made cartoons explaining web concepts.

2018-02-26
Inspired by an analysis of the source code of
Rogue, I decided to find
out how modern compilers encode selecting the maximum of four integers.

```
int max(int a, int b, int c, int d) {
return a > b ? (a > c ? (a > d ? a : d) : (c > d ? c : d))
: (b > c ? (b > d ? b : d) : (c > d ? c : d));
}
```

It is important to note that this is isolated code generation and a modern
compiler could probably do better if more context was available.

Using Clang 5.0, I get the following (with optimizations enabled, of course).

```
0: 39 f7 cmp edi, esi
2: 0f 4d f7 cmovge esi, edi
5: 39 d6 cmp esi, edx
7: 0f 4c f2 cmovl esi, edx
a: 39 ce cmp esi, ecx
c: 0f 4c f1 cmovl esi, ecx
f: 89 f0 mov eax, esi
11: c3 ret
```

I will rename the code according to the calling convention used here.

```
a -> edi
b -> esi
c -> edx
d -> ecx
```

The result follows with some comments on what the code is doing.

```
cmp a, b
cmovge b, a # If a >= b, assign a to b.
# Now b has max(a, b).
cmp b, c
cmovl b, c # If max(a, b) < c, assign c to b.
# Now b has max(max(a, b), c).
cmp b, d
cmovl b, d # If max(max(a, b), c) < d, assign d to b.
# Now b has max(max(max(a, b), c), d).
mov eax, b # Return b through eax.
ret
```

I don’t think it can be done in less code than this.

As one might expect, using C++ and std::max, we get almost identical x86.

2018-01-04
In this post I present a short comparison between the machine code generated by
GCC 7.2 and Clang 5.0.0 for the evaluation of a floating point polynomial.

For both compilers, only -O2 was used, but -O3 produced the same code.

```
float evaluate(float a, float b) {
return (a - b + 1.0f) * (a - b) * (a - b - 1.0f);
}
```

## GCC

```
evaluate(float, float):
subss xmm0, xmm1
movss xmm2, DWORD PTR .LC0[rip]
movaps xmm1, xmm0
addss xmm1, xmm2
mulss xmm1, xmm0
subss xmm0, xmm2
mulss xmm0, xmm1
ret
.LC0:
.long 1065353216
```

## Clang

```
.LCPI0_0:
.long 1065353216 # float 1
.LCPI0_1:
.long 3212836864 # float -1
evaluate(float, float): # @evaluate(float, float)
subss xmm0, xmm1
movss xmm1, dword ptr [rip + .LCPI0_0]
addss xmm1, xmm0
mulss xmm1, xmm0
addss xmm0, dword ptr [rip + .LCPI0_1]
mulss xmm0, xmm1
ret
```

### Explanation (for the Clang output)

```
xmm0 = a
xmm1 = b
xmm0 = a - b
xmm1 = 1.0f
xmm1 = a - b + 1.0f
xmm1 = (a - b + 1.0f) * (a - b)
xmm0 = a - b - 1.0f
xmm0 = (a - b - 1.0f) * (a - b + 1.0f) * (a - b)
```

GCC seems to prefer movaps over movss, even though movss is sufficient in this
case. A reason for doing so is that using movaps avoid stalls from partial
updates to XMM registers. Clang doesn’t generate movaps, but uses two constants
and only addss for them rather than only having one and using subss to subtract
one from a register.

After benchmarking these alternatives, they had roughly the same throughput.

2017-11-30
In this post I present a short comparison between the machine code generated by
Clang 5.0.0 for two different for loops.

## Sum up to N

The first function sums from 1 to N and returns the result.

### C++

```
int sumUpTo(int n) {
int x = 0;
for (int i = 1; i <= n; i++) {
x += i;
}
return x;
}
```

### Output

```
sumUpTo(int):
test edi, edi
jle .LBB0_1
lea eax, [rdi - 1]
lea ecx, [rdi - 2]
imul rcx, rax
shr rcx
lea eax, [rcx + 2*rdi - 1]
ret
.LBB0_1:
xor eax, eax
ret
```

As you can see, Clang used a closed formula to evaluate the expression!

### What should it be

```
(1 + n)(n) / 2
= (n^2 + n) / 2
```

### What Clang does

```
(n - 1)(n - 2) / 2 + (2n - 1)
= (n^2 - 3n + 2) / 2 + (4n - 2) / 2
= (n^2 + n) / 2
```

It is important to note that the reason why a compiler would rather evaluate
this convoluted expression is that it will not overflow like the first
expression would.

## Sum up to N multiplying by 2

### C++

```
int sumTwiceUpTo(int n) {
int x = 0;
for (int i = 1; i <= n; i++) {
x += 2 * i;
}
return x;
}
```

### Output

```
sumTwiceUpTo(int):
test edi, edi
jle .LBB1_1
lea eax, [rdi - 1]
lea ecx, [rdi - 2]
imul ecx, eax
and ecx, -2
lea eax, [rcx + 4*rdi - 2]
ret
.LBB1_1:
xor eax, eax
ret
```

### What is is

### What Clang does

```
((n - 1)(n - 2) & 111...0) + 4n - 2
= (n - 1)(n - 2) + 4n - 2
= n^2 + n
```

Again, a closed formula!

It is interesting to note that the and is completely useless here: it is forcing
the product (which is always even) into a even number. Also, it requires the
multiplication result to be evaluated, so it could add measurable delay.