Spherehash : A (roughly) even-density location hashing scheme
I've been thinking up an app which involves storing geographical information for small areas on the Earth. So I've been thinking about a scheme to enumerate roughly-equal areas on the Earth in a way that is easily converted to and from coordinates. As the reading and writing will generally be done to cells in one area at a time, cells close to each other should be stored close to each other in the data structure to speed up reads. So the index for this enumeration should preserve locality. This post is a proposal for an index for equal-area (on a spherical model of the earth) cells, which could also be used as a hash for points on the earth.
Towards the end there are examples and a tool to try it out.
Introduction
Geohashing schemes that I have seen either model the earth using a cylindrical projection, which leads to nice squares at the equator and thin strips at higher latitudes, this means that cells in higher latitudes are significantly smaller and could use quite a bit more data than information about an equivalent area around the equator. I feel that this deficiency applies to my use case more than the usual use cases of these geocoding schemes, but I haven't done the work to prove that. It also feels inelegant to so unequally use the hash space. The other geocoding schemes I gave seen (eg. Geodesic grid) are very geometrically complex, they're used for things like weather modelling or biodiversity statistics, where even division of the surface is vastly more important than keeping computational complexity down.
This scheme aims at an even area in each cell, and tries to keep the cells an even shape without incurring much computational complexity. Variation in the area between cells will be due to inaccuracies in modelling the Earth as a sphere, this could be improved by an ellipsoid model, but I'll just investigate the spherical model first. Outside of that modelling inaccuracy the even area per cell is guaranteed by splitting the area in half at extra bit of accuracy.
Algorithm
The hash would be binary, with the most significant bit distinguishing northern and southern hemispheres and each further bit splitting the space in half by area. Each split will be either north-south or east-west, depending on the shape of the parent. An east-west split is simple and only needs to split the longitude range of the parent in half, a north-south split however needs to find a latitude within the parent latitude range which divides the area of the parent in half, more on this below. Unlike a quad-tree which alternates between splitting the space north-south and east-west, a decision would be made at each level on which way to split based on comparing the lengths of the sides of the quad, choosing the option which moves makes the children more square, this means that near the equator, there will be more of an even split of divides but near the poles there will be almost exclusively splitting north-south.
Two formulae define how the sphere is divided up into cells, the initial cell is the whole earth, and for each bit of precision added to the hash, Formula 1 is used to decide which direction to split the cell in half.
Formula 1: split direction decision
if (north lat - south lat) < (east long - west long) * cos(max(abs(south lat), abs(north lat)) split in to east-west children else split into north-south children
Formula 2: find the latitude for a split between north and south child
This finds at which latitude a cell can be split so that the two divided cells both have the same area.
division latitude = arcsin ((sin(north lat) + sin(south lat)) / 2)
Cell shapes
Although the area of the cells are regular, the shapes are not so regular, but they follow a pattern. A good shape would have a low maximum diameter, the lowest maximum diameter is a circle, D = 2 * sqrt(A/pi)
, a square is pretty good, D = sqrt(2 * A)
. Let's look at the kind of shapes that cells form under this scheme.
I use regexes here to represent the binary representation of the cell's spherehash representation, the shapes should apply for any number of bits used in the hash.
0+
and1+
: The south and north pole cells form circles. the first bit represents the hemisphere and each following bit reduces the area covered in half by reducing the latitude covered, leading to a reduced polar cap.0+1
and1+0
: The cells immediately around the pole are where this scheme really struggles, the second cell from each pole forms a ring around the pole cell, making the distinction between them fairly pointless.0+1.
and1+0.
: The next two form two half rings around the second cell ring, producing an even larger maximum diameter.0+1..
and1+0..
: The next 4 are starting to produce more reasonable shapes.
The division scheme means that the first parallel contains only one segment, and each following parallel until the equator can have at most twice as many segments. So the initial rings are poorly divided, but after a few they start dividing well.
The graphic below demonstrates the first 5 bits of the scheme, each one with an extra division of all cells, and so twice as many cells as the previous.
Behrmann projection by Luboš Matásek - Own work, CC BY-SA 3.0, Link
The tool below allows you to see how the area covered is halved for each bit of extra accuracy calculated, showing the produced spherehash represented as binary and hex fractions. The box shows the current area, blue meaning it is either the northern or southern half of the parent, red meaning it is the western or eastern half. The grey crossing lines show the next 2 stages of of the current box, this will be a plus once the area is small enough that the meridians are close to parallel with each other, which will be quicker to reach near the equator than near the poles.
0 bit spherehash :
as hex fraction :
Some good demonstrative coords (click to use):
- Longyearbyen, the capital of Svalbard is one of the northernmost towns in the world. That it's spherehash starts with 6
1
's means it is within the northernmost2-6
of the world, but not within the northernmost2-7
.
11111101000110011010001011011011NNNNNNSEWWWNESWENSEWSWNWNESENWNE - Quitto, the capital of Ecuador is one of the closest cities to the equator, after just six bits the hash becomes a regular interleaving of the 2 different dimensions.
01010011101010101011000101001000SNWEWWNENWNWNWNWNWNESWSESESWNWSW - The south pole, it's not visible on the projection used in this map, the hash is an endless string of 0's, representing how far south it is with increasing accuracy, no bits need to be wasted on the longitude.
11111111111111111111111111111111SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS - Your location
Conclusion
This seems simple enough to implement and effective for my use case, but it's further work to build a tree suitable to store the data addressed by these hashes and then project that data onto an actual map. Whether it would make a material difference to the project I have in mind remains to be seen.
If you've seen any similar hashing schemes I didn't mention, or have any comments or potential uses for this then please let me know : platy@njk.onl.