Concealed Squares

This is my solution to Project Euler problem number 206. Although a brute force solution can be easily derived, this approach makes use of mathematical analysis to find the answer in less time. The problem statement follows.

Find the unique positive integer whose square has the form 1_2_3_4_5_6_7_8_9_0, where each “_” is a single digit.

It can be shown that the answer lies in the range R.

R = [floor(sqrt(1020304...00)), ceiling(sqrt(192939..90))] = [1010101010, 1389026624]

R has 378,925,614 integers, which is a lot to test quickly. Fortunately, one can reduce it by noticing that the square of the integer is divisible by 10, which implies that the integer too is divisible by 10. This leaves us with the possible solution set P.

P = {1010101010, 1010101020, ..., 1389026620}

The largest possible number is 19293949596979899990, which requires 65 bits to be represented as an unsigned integer.

However, because we know that the answer is a multiple of 10, we also know that the square of the answer is a multiple of 100. And, therefore, that it ends with zeros. This allows for another optimization: divide all numbers in P by 10 and find the one whose square is of the form 1_2_3_4_5_6_7_8_9. These numbers fit into 64-bit words, so arithmetic with them will not be specially expensive.

P has 37,892,561 elements, but we do not need to check all of them. If x2 ends in 9, x ends in 3 or 7, which reduces by 80% the number of values we have to test.

Using Clang with the O3 flag, my C++ implementation of the algorithm finishes after 120 ms, which is a good enough solution for this problem.

Solving Tic-Tac-Toe Using Racket

I decided to implement a perfect AI (using Minimax) for Tic-tac-toe. I had not decided though whether I’d use JavaScript or Racket, two languages I’ve been using quite a bit recently and about which I am quite interested. As the post title gives off, I’ve decided to write a functional programming solution using Racket. It is open-source and available on GitHub. See the repository. There is also a fairly decent GUI-based game that allows you to play against the AI.

GUI-based Tic-tac-toe game

When writing this I decided to use vectors instead of lists, expecting it would give me better performance. However, it is quite slow. Even though the current implementation is good enough for real-time gameplay, it could be much more efficient for such a simple decision tree as the one used in Tic-tac-toe. Maybe using lists would have been a better choice as some parts of the application now have to convert vectors to lists. Additionally, slicing and merging vectors seem to generate a lot of work for the garbage collector. Finally, as the biggest vector holds only nine elements, the constant-time random access vectors provide is likely not making too much of a difference.

Being an automation enthusiast myself. This project also got its amount of continuous integration and delivery. Every push and pull request is built and tested by Travis CI. Here you can find the build history of the project. Travis also prepares and deploys a Linux-only standalone distribution to GitHub releases on tags. See the releases page. In the future I may make Windows and Mac OS X standalone distributions available. You should be able to play the standalone game on Linux even without having Racket installed.

There is also alpha-beta pruning left to be implemented, which should also significantly improve performance. Even though this is still a work in progress, the game works and the AI is indeed perfect. It also runs in any operating system as long as it has a Racket 6 distribution installed.

Site Statistics

This is my first variable post. By variable post I mean a post that might change at every recompilation of the website.

I had this idea when thinking about getting a word count of the project. Instead of just computing the word count for myself, I decided to make it into an always up-to-date blog post.

Basic statistics

Word count 35141
Post count 48
Page count 70
Mean word count per post 245
Mean word count per page 333

Extrapolated statistics

The average English reader would take 118 minutes to read the whole website, whose content amount is equivalent to that of 176 songs or 7 short stories.

CPU Cache Basics

This post covers fundamental cache concepts such as locality of reference, mappings, and why caching is important.

Caching basics

Caching is used to improve CPU performance. Main memory is far from the CPU and also operates in a much lower frequency. Storing the data and the code that the processor is most likely to use next close to it and in a faster memory improves its overall performance.

It is worth noting that some designs split the cache into I-cache (instruction cache) and D-cache (data cache).

Locality of reference

Caching is based on locality of reference. This means that usually access patterns exhibit temporal locality (data that has already been requested is likely to be requested again soon) and spatial locality (data physically near data that has been accessed recently is more likely to be accessed).

Notes on locality of reference

In some situations, temporal locality works against us. For instance, when playing media files storing already played memory blocks in the cache is usually is a waste of space.

While temporal locality implies spatial locality (reused data always presents some physical pattern), spatial locality does not imply temporal locality. Take for instance different functions in a business application. The user may never request more than two of the three hundred available operations, rendering most of the caching that could be performed based on spacial patterns useless.

Many applications have very different memory access patterns for code and data. Take, for instance, a media player which has an excellent temporal locality for code (the code that renders a frame is always reused) but a terrible spatial locality for data (the bytes that make up a frame are hardly ever useful again). This explains why some designers split the cache into I-cache and D-cache as mentioned above.

Tag RAM

When the CPU requests a byte from a particular block from RAM, it needs to determine if it is in the cache or not and, if it is, where it is. For this reason there exists a piece of memory called a tag associated with each frame of the cache. This piece of information allows the CPU to know about the blocks currently held in that cache frame. This memory needs to be very fast in order to reduce the overhead of cache lookup.

Mappings

Essentially, there are three ways to use tags to map RAM blocks to block frames in the CPU cache.

Fully Associative Mapping

This is arguably the simplest possible mapping scheme. This mapping allows for any memory block to occupy any cache frame. This simplicity comes at a price: in order to determine whether or not a memory block is in the cache, the CPU has to search all of the cache frames.

Direct Mapping

In direct mapping, each cache frame can only contain a memory block from a subset of all memory blocks. The memory is evenly divided among the available cache frames, in such a manner that if the CPU needs to fetch a certain memory block there is only one cache frame it needs to check. This method solves the problem of lengthy cache scans of fully associative mapping at the expense of not allowing memory blocks that are mapped to the same cache frame to be in the cache at the same time, which can severely hinder performance in some scenarios.

N-Way Set Associative Mapping

The third type of cache mapping tries to lessen the conflicts introduced by direct mapping. This method may be seen as a mixture of both methods mentioned above. In set associative mapping, there are multiple cache blocks in each cache frame, and each cache frame is responsible by a subset of main memory. This way the same memory block has more than one cache block it can be assigned to and the chance of it evicting another relevant block is reduced. One can imagine, for the sake of visualization, set associative mapping as multiple smaller fully associative mappings.

This is the most commonly used form of mapping.

Cache misses

Cache miss is a state where the data requested by the CPU is not found in the cache. There are basically three types of cache miss:

Compulsory miss

This is what the miss is called because the sought after data was never loaded into cache. This type of miss is also known as a cold miss.

Conflict miss

A conflict miss occurs when the CPU tries to find a memory block in the corresponding cache frame but the frame is already filled with other memory blocks.

Capacity miss

This last type of miss happens when the block containing the needed data has already been evicted from the cache.

Writing policies

So far only reading from cache has been discussed. However, data is also being written as most computers store the results of their computations.

There are essentially two writing policies for caches, and different combinations of them may be used. Write-through is the simplest one and the safest, from a multithreaded point of view. It consists of writing changes to all levels of the cache back to main memory. The other solution is write-back, which consists of only writing the changes to the cache level below once the cache block is evicted. This model makes guaranteeing proper synchronization much more complex in order to improve performance.

Dungeon and the BSD 3-Clause License

Last weekend I was determined to solve Dungeon’s licensing issues. However, after seeing how much work figuring out in which year each file was modified from Git history would be, I decided to take a simpler route that I believe will make Dungeon an even better project. I changed to the BSD 3-Clause license, which seems to accept having a single license notice for the whole project. Thus, individual files no longer need their own preambles.

From the GNU website:

You should put a notice at the start of each source file, stating what license it carries, in order to avoid risk of the code’s getting disconnected from its license.

Which is a good point. However, it does not really apply to Dungeon. Most of the code is very project-specific and wouldn’t be of much use elsewhere. Also, if you want to copy a generic solution, such as our CircularList and never give back, I am OK with it and hope you make good use of my code.

The mentioned page also states that

This has nothing to do with the specifics of the GNU GPL. It is true for any free license.

As I said, I don’t have enough reasons to do this as I am already using a permissive license. One could argue that the no warranty clause should always be present, but I don’t think it is an important issue right now, as to make me responsible for damage caused by my software would require acknowledging that the code came from the repository, which has a license note of the BSD 3-Clause license.

I’ve come to like permissive licenses a lot more in the last two years. I think that the most open and free the code is, the more value it has as more people may benefit from it. And this may be more easily achieved by using permissive (non copyleft) licenses, such as the BSD licenses.

In the end, I just hope this is for the best of my species and that everyone can benefit, or, at least, not be damaged by the license change.