njk.onl / platy's blog /RSS feed

Spherehash : A (roughly) even-density location hashing scheme

I've been thinking up an app which involves storing geographical information for small areas on the Earth. So I've been thinking about a scheme to enumerate roughly-equal areas on the Earth in a way that is easily converted to and from coordinates. As the reading and writing will generally be done to cells in one area at a time, cells close to each other should be stored close to each other in the data structure to speed up reads. So the index for this enumeration should preserve locality. This post is a proposal for an index for equal-area (on a spherical model of the earth) cells, which could also be used as a hash for points on the earth.

Towards the end there are examples and a tool to try it out.

Introduction

Geohashing schemes that I have seen either model the earth using a cylindrical projection, which leads to nice squares at the equator and thin strips at higher latitudes, this means that cells in higher latitudes are significantly smaller and could use quite a bit more data than information about an equivalent area around the equator. I feel that this deficiency applies to my use case more than the usual use cases of these geocoding schemes, but I haven't done the work to prove that. It also feels inelegant to so unequally use the hash space. The other geocoding schemes I gave seen (eg. Geodesic grid) are very geometrically complex, they're used for things like weather modelling or biodiversity statistics, where even division of the surface is vastly more important than keeping computational complexity down.

This scheme aims at an even area in each cell, and tries to keep the cells an even shape without incurring much computational complexity. Variation in the area between cells will be due to inaccuracies in modelling the Earth as a sphere, this could be improved by an ellipsoid model, but I'll just investigate the spherical model first. Outside of that modelling inaccuracy the even area per cell is guaranteed by splitting the area in half at extra bit of accuracy.

Algorithm

The hash would be binary, with the most significant bit distinguishing northern and southern hemispheres and each further bit splitting the space in half by area. Each split will be either north-south or east-west, depending on the shape of the parent. An east-west split is simple and only needs to split the longitude range of the parent in half, a north-south split however needs to find a latitude within the parent latitude range which divides the area of the parent in half, more on this below. Unlike a quad-tree which alternates between splitting the space north-south and east-west, a decision would be made at each level on which way to split based on comparing the lengths of the sides of the quad, choosing the option which moves makes the children more square, this means that near the equator, there will be more of an even split of divides but near the poles there will be almost exclusively splitting north-south.

Two formulae define how the sphere is divided up into cells, the initial cell is the whole earth, and for each bit of precision added to the hash, Formula 1 is used to decide which direction to split the cell in half.

Formula 1: split direction decision

if (north lat - south lat) <
   (east long - west long) * cos(max(abs(south lat), abs(north lat))
  split in to east-west children
else
  split into north-south children

Formula 2: find the latitude for a split between north and south child

This finds at which latitude a cell can be split so that the two divided cells both have the same area.

division latitude = arcsin ((sin(north lat) + sin(south lat)) / 2)

Cell shapes

Although the area of the cells are regular, the shapes are not so regular, but they follow a pattern. A good shape would have a low maximum diameter, the lowest maximum diameter is a circle, D = 2 * sqrt(A/pi), a square is pretty good, D = sqrt(2 * A). Let's look at the kind of shapes that cells form under this scheme.

I use regexes here to represent the binary representation of the cell's spherehash representation, the shapes should apply for any number of bits used in the hash.

The division scheme means that the first parallel contains only one segment, and each following parallel until the equator can have at most twice as many segments. So the initial rings are poorly divided, but after a few they start dividing well.

The graphic below demonstrates the first 5 bits of the scheme, each one with an extra division of all cells, and so twice as many cells as the previous.

0 0 00 01 10 11 000 001 010 011 100 101 110 111 0000 0001 0010 0011 0100 0101 0110 0111 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111

Behrmann projection by Luboš Matásek - Own work, CC BY-SA 3.0, Link

The tool below allows you to see how the area covered is halved for each bit of extra accuracy calculated, showing the produced spherehash represented as binary and hex fractions. The box shows the current area, blue meaning it is either the northern or southern half of the parent, red meaning it is the western or eastern half. The grey crossing lines show the next 2 stages of of the current box, this will be a plus once the area is small enough that the meridians are close to parallel with each other, which will be quicker to reach near the equator than near the poles.

Coords

0 bit spherehash :
as hex fraction :

Some good demonstrative coords (click to use):

Conclusion

This seems simple enough to implement and effective for my use case, but it's further work to build a tree suitable to store the data addressed by these hashes and then project that data onto an actual map. Whether it would make a material difference to the project I have in mind remains to be seen.

If you've seen any similar hashing schemes I didn't mention, or have any comments or potential uses for this then please let me know : platy@njk.onl.