Quick look into geocoding

Published Oct. 12, 2018
Updated Oct. 18, 2021

Geocoding is the process of assigning a geographic coordinates to a description of location-related information. The most common use is the attribution of geographic coordinates (latitude and longitude) to a street or postal address or address geocoding. In contrast, reverse geocoding is the reverse process that transforms a pair of geographic coordinates into an address.

Map pointing to the ASU Library Map and Geospatial Hub

Geocoding begins when data in text or tabular form is compared to a reference data table that includes defined map coordinates. When the input data is matched to the reference data, the corresponding map coordinates are assigned to the input data. Reference data is typically based on a segmented street centerline layer that contain information on house number ranges. Geographic coordinates are then interpolated from the estimated location where the address number falls on the segment. For example, if a road segment contain the address range 100 – 119 and runs west to east and the address attribute is 109, then the geographic location would be roughly 50% of the way along the segment on the odd side of the street.

Addresses are interpolated based on where they fall in the address range.

The quality and accuracy of the geocoded data depends on understanding the reference table and data, the methods in which the matches are being produced and the given accuracy once a match is found.

Knowing the format required for the geocoder that you intend to use is critical to getting identical matches. Accuracy depends on the nature of input data including the format and “cleanliness”. For example, input data that includes misspellings, special characters (such as “ \ % # ?) and abbreviations often result in inaccuracy and mismatches. Be prepared to refine your data and re-geocode your data as errors or typos may be found during the process. Finally, it is important to perform a quality control check on your geocoding results by comparing the address locations against other data sources, such as street basemaps.  

Geocoding can be done on a case by case basis or in batches using an open-source or commercial geocoding service. There are many batch geocoding services online that are free up to a pre-set level while others that charge a fee. As geocoding becomes more valuable, batch geocoding large numbers of addresses has become costly and geocoding batch services limited.

ESRI’s World Geocoding Service is available to ASU faculty, staff and students. However, the service uses 40 credits per 1,000 addresses, which are pulled from the ASU University credit pool.  Currently, each user in the ASU community has the ability to geocode up to approximately 25,000 addresses. This, however, would use up all of the user's credits. An alternative to utilizing a pre-configured geocoding service is to create your own address locator. Address locators are based on an address locator style that defines reference data used and rules for address format and parsing depending on locator style.

Recently, Erica Quintana, a Policy Analyst in the Morrison Institute for Public Policy, created composite address locators for all of Arizona using the Census TIGER files. She has agreed to share them with the ASU community. The address locators and supporting files can be accessed by submitting a quick service request through the Map and Geospatial Hub's Service Request form. (We respond to all requests within two business days, typically much sooner.)

Aside from using the address locators from ASU, there are many geocoding service options. Here are a few to consider:

Using this locator for batch geocoding consumes credits. ASU provides students and staff with 1,000 credits allowing up to 25,000 batch geocodes. Beyond this amount, it is advised that you look into creating your own address locator as this service consumes a large amount of credits.
 
The Census batch online geocoder can process up to 10,000 addresses at a time.
 
Texas A&M offers a free geocoding service for noncommercial research projects that need to geocode U.S addresses. Uploads are limited to 2,500 addresses at a time. Please consult their privacy, security & terms to determine if the service aligns with your research project's requirements.
 
As of July 2018, Google now has a pay-as-you-go model. For 0 -100,000 addresses, it is 0.005 USD per each address (5.00 USD per 1000). There is no longer a daily limit.
 
A modular, open-source geocoder built on top of Elasticsearch for fast and accurate global search. Offers the option to download your own instance of Pelias using their Docker setup.
 
Build your own address geocode:
Two useful software platforms that allow for the creation of address locators are ArcGIS or QGIS.

These are just a few of the geocoding methods out there to you started. Once you get going, the sky is the limit. Happy geocoding!

by Jill Sherwood