Constituency Boundary Mapping Project
I've been working on a project that sent me off into the world of large data sets and Google Earth KML data files. The aim was to produce a web page widget that showed a Parliamentary Constituency on Google Maps. This has been done many times and the boundary 2010 data can be downloaded from sites such as
MapIt.uk (this is the Cities of London and Westminster Constituency). Here is my solution:
Use the mouse wheel to zoom in and out.
Use the left mouse button to pan and drag.
But I wanted to show neightbours as well since if you are trying to work out which constituency you live in, say, and you're just across the border, then these individual maps don't give you that information. When I downloaded all 650 constituencies I discovered that each file was very large and totaled some 129Mb. This is way too much to show in a browser and is asking for a lot of transmission data even for one constituency. Merely drawing all neighbours didn't seem elegant either, since the data is polygonal and they overlap where they touch - doubling the number of points required.
Sample Reduction
So I opened up the KML files and had a poke around. The first thing that struck me was the co-ordinate data was being stored to eleven decimal places! This seems to be the norm among mappers who merely output every digit a floating point can store, rather than consider what is necessary. What is necessary for my purposes was five decimal places - which gives a resolution of around a meter. More than good enough for boundary data. So there was a big saving there.
The next thing that struck me was how detailed the KML boundary data was. If you zoom in, especially on a river boundary, every curve and nuance of the line is faithfully recorded. No wonder the files were so big. Obviously the primary source of this data has come from high quality land surveys - great if you are in a land dispute or want to dig for a gas main. But overkill for users browsing polygons on the tens of kilometer scale.
I needed a method of extracting points from the polygons while maintaining the general integrity of the shape. After several attempts on my own I switched to the
Ramer-Douglas-Peucker algorithm. I wrote a C program to read in the KML files and build polygons in memory. The RDP uses a determinate based on distance below a threshold. This gave me a reduced polygon of evenly spaced coordinates but it cut corners and left lots of points on straight lines. I changed the algorithm to consider angles instead, on the theory that a very wide angle near 180° would be so close to a straight line it could be removed. This didn't work very well with needle shaped triangles. What I settled on was determining the area of the triangle formed by each set of three points.
My selection process depended on trial and error to get the right threshold value but the sampling process was very successful. I got the polygons down to around 8% of their original number.
Coastal Clean up
The next task was to work out which points where coastal and which were inland. MS Access database was quickest at this. The distinction is useful since coastal points cannot be points that a neightbour will also use. Then I could assign up to three neighbours to each point and get a set of polygon points that was neightbour - aware. (Interesting I discovered that there are 8 points where four constituencies meet. I'll get these mapped at a later date!) For each constituency it was a matter of extracting the whole polygon for that constituency and then the neighbours all-bar the segment that was a neighbour of the central polygon. This is more complex than it sounds and one or two constituencies have the odd coordinate that is wrong and I need to touch this up.
Finally I needed to combine the geometry of the target constituency with it's neightbours in a single KML file. These average 47k - not bad! But this is skewed by three massive Scottish constituencies containing hundreds of islands. In fact, most coastal constituencies contain way to many points covering the coastline - which is not important for the purpose of Parliamentary boundaries. I have created a point editor and have begun the process of simplifying the coastal points. A line a mile or two off the coast would dramatically reduce the coordinate density further, but it can't be an automated process.
Re-use of this widget
I would like to release the KML data in it's entirety which I have cleaned up the coastal data. Also, the election in 2015 will include boundary modifications so I'll have to process the whole lot again in a year's time. I am happy for web masters to use the code on their websites, but please a) credit myself and b) consider a PayPal donation - this was a not inconsiderable effort. Get in touch and I'll be happy to discuss this.