Spatial

Finding nearby cities using SQL Server

A scenario arose recently that required me to be able to find all of the cities within a given radius from a geographic center-point (latitude / longitude).

At first I was tempted to utilize an existing mapping service to handle this for me, but most of those services impose limits on the number of requests that you can send in a given time period. In this instance, I needed something that was not going to be subject to those limits.

There are a lot of different ways to solve this problem. In this post, I am going to cover how to do so using SQL Server 2012+.

The first step is to locate a list of cities and their latitude/longitude. A good source for this is GeoNames.org. There are a wide variety of data downloads available. For my purposes, I am going to grab Cities5000.zip. This archive contains a list of all of the cities with a population of 5,000 or more.

There is a lot of really good documentation about what the download contains, but for ease of reference, I'll reproduce it here:

geonameid         : integer id of record in geonames database
name              : name of geographical point (utf8) varchar(200)
asciiname         : name of geographical point in plain ascii characters, varchar(200)
alternatenames    : alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)
latitude          : latitude in decimal degrees (wgs84)
longitude         : longitude in decimal degrees (wgs84)
feature class     : see http://www.geonames.org/export/codes.html, char(1)
feature code      : see http://www.geonames.org/export/codes.html, varchar(10)
country code      : ISO-3166 2-letter country code, 2 characters
cc2               : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters
admin1 code       : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)
admin2 code       : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) 
admin3 code       : code for third level administrative division, varchar(20)
admin4 code       : code for fourth level administrative division, varchar(20)
population        : bigint (8 byte int) 
elevation         : in meters, integer
dem               : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.
timezone          : the iana timezone id (see file timeZone.txt) varchar(40)
modification date : date of last modification in yyyy-MM-dd format

You can pull all or part of the data into SQL. I am going to pull in all the columns. I don't need all of them at the moment, but they could be useful later on.

The following will create a table for the city data and use bulk import to populate it.
Note: this assumes that you have unzipped the data to c:\temp

First create the table

CREATE TABLE Cities (
geonameid       int	NOT NULL PRIMARY KEY,
[name]          nvarchar(200) NOT NULL,              
asciiname       nvarchar(200) NOT NULL,
alternatenames  nvarchar(max),  
latitude        numeric(18,15),  
longitude       numeric(18,15), 
feature_class   char(1), 
feature_code	varchar(10),      
country_code    char(2),  
cc2             nvarchar(200),  
admin1_code     nvarchar(20),  
admin2_code     nvarchar(80),  
admin3_code     nvarchar(20),  
admin4_code     nvarchar(20),  
[population]	decimal,
elevation       int,  
dem             int,  
timezone        nvarchar(40),  
modification_date_tmp	nvarchar(50)
)

Now, import the data

BULK INSERT Cities FROM 'c:\temp\cities5000.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
);

Ok. We have our raw data. Now we need to let SQL Server know how to plot each of these cities on a geographic plane. To do this, we are going to use the geography spatial data type that is available in SQL Server. geography is actually a .Net CLR data type that is specifically tailored to work with latitude and longitude. You can read more about geography here

ALTER TABLE Cities 
ADD Point AS CONVERT([geography],
                     CASE WHEN [Latitude]<>(0) 
                          AND [Longitude]<>(0) 
                     THEN Geography::Point([Latitude],[Longitude],(4326))  
                     END,(0))

Let's break down what is happening above.

First, I'm adding a Point as a column to Cities. In this instance, I'm creating a computed column. As Latitude or Longitude is updated, this will automatically update the value for Point. If you would prefer, you can create Point as a geography column and run an update statement to do this calculation once. (just remember to re-run update if you re-import the data!).

I'm using CONVERT to turn the data into a geography type.

I'm wrapping the data in a CASE statement to skip the conversion if Latitude and Longitude aren't set properly.

Finally, I'm converting the Latitude and Longitude into a geographical point using Geography::Point(). The :: is SQL Server syntax for calling the Point method on the geography data type. Think of it like a static method on a class.

In this case, Point takes a latitude, a longitude and something called a SRID (Spatial Reference Identifier). In this case, I'm using 4326 which gives us the standard -180:180/-90:90 that you're probably used to seeing.

If you open SQL Server Management Studio (SSMS) and select all the records from cities, you should see a new tab called "Spatial Results". Clicking that will give you something similar this.

First 5,000 cities

Great!

Now then, what if we want to use this data to find all of the cities within 5 miles of Manhattan?

First, let's grab the Point from Manhattan.

SELECT Name, Point FROM cities WHERE name = 'Manhattan' AND admin1_code = 'NY'

The result should resemble the following:

Manhattan	0xE6100000010C475A2A6F47644440A4703D0AD77D52C0

That hex string on the right is the text representation of Manhattan's center-point.

To draw a 25 mile circle around that center-point we are going to use another built-in method called STBuffer.

STBuffer takes a single argument of Distance which is defined in meters. Since we are trying to work in miles, we are going to have to do a conversion.

SELECT Name, Point, Point.STBuffer(5 * 1609.344) as SearchArea
FROM cities 
WHERE name = 'Manhattan' 
AND admin1_code = 'NY'

The 1609.344 in the query above is the approximate meters per mile.

This time if you look at the Spatial Results (and select SearchArea from the dropdown on the right) you'll see something like this:

Manhattan 25 mile radius

Now for the final step.

DECLARE @SearchArea GEOGRAPHY

SELECT @SearchArea = Point.STBuffer(5*1609.344)
FROM cities 
WHERE name = 'Manhattan' 
AND admin1_code = 'NY'

SELECT Name, Point
FROM Cities
WHERE [point].STIntersects(@SearchArea) = 1 

In the query above, I have saved the search area we defined to a variable so that it is easier to use later on.

The heavy lifting here is being done by STIntersects. Like STBuffer, STIntersects is a method available off of the geography type. In this case, it takes another geography type as its argument. STIntersects will determine if the two geography instances cross over one another.

It is important to notice that this ellipse from the previous step is solid and not outlined. Since our goal is to find everything contained inside of the radius, if we only had the outline of a circle, then the only place that it would intersect would be along the exact outside of the circle. In all likelihood we would not get any results at all!

What did we get for all of our hard effort? Here are all the cities (with a population of 5,000 or more) within 5 miles of Manhattan's center-point.

Cliffside Park
Edgewater
Fairview
Fort Lee
Guttenberg
Hoboken
North Bergen
Palisades Park
Ridgefield
Secaucus
Union City
Weehawken
West New York
Long Island City
Manhattan

This barely scratches the surface of what is available as part of the Geography/Geometry functionality inside of SQL Server. For a more complete list of all of the OGC (STXXXX) methods refer to the MSDN article

SQL Server Spatial