Geocode with Python.

How to Convert physical addresses to Geographic locations → Latitude and Longitude

Datasets are rarely complete and often require pre-processing. Imagine some datasets have only an address column without latitude and longitude columns to represent your data geographically. In that case, you need to convert your data into a geographic format. The process of converting addresses to geographic information — Latitude and Longitude — to map their locations is called Geocoding.

Geocoding is the computational process of transforming a physical address description to a location on the Earth’s surface (spatial representation in numerical coordinates) — Wikipedia

In this tutorial, I will show you how to perform geocoding in Python with the help of Geopy and Geopandas Libraries. Let us install these libraries with Pip if you have already Anaconda environment setup.

pip install geopandas
pip install geopy

If you do not want to install libraries and directly interact with the accompanied Jupyter notebook of this tutorial, there are Github link with MyBinder at the bottom of this article. This is a containerised environment that will allow you to experiment with this tutorial directly on the web without any installations. The dataset is also included in this environment so there is no need to download the dataset for this tutorial.

Geocoding Single Address

To geolocate a single address, you can use Geopy python library. Geopy has different Geocoding services that you can choose from, including Google Maps, ArcGIS, AzureMaps, Bing, etc. Some of them require API keys, while others do not need.

 

Geopy

As our first example, we use Nominatim Geocoding service, which is built on top of OpenStreetMap data. Let us Geocode a single address, the Eifel tower in Paris.

locator = Nominatim(user_agent=”myGeocoder”)
location = locator.geocode(“Champ de Mars, Paris, France”)

We create locator that holds the Geocoding service, Nominatim. Then we pass the locator we created to geocode any address, in this example, the Eifel tower address.

print(“Latitude = {}, Longitude = {}”.format(location.latitude, location.longitude))

Now, we can print out the coordinates of the location we have created.

Latitude = 48.85614465, Longitude = 2.29782039332223

Try some different addresses of your own. In the next section, we will cover how to geocode many addresses from Pandas Dataframe.

Geocoding addresses from Pandas

Let us read the dataset for this tutorial. We use an example of Store addresses dataset for this tutorial. The CSV file is available in this link.

addresses.csv
Dropbox is a free service that lets you bring your photos, docs, and videos anywhere and share them easily. Never email…
www.dropbox.com
 

Download the CSV file and read it in Pandas.

df = pd.read_csv(“addresses.csv”)
df.head()

The following table provides the first five rows of the DataFrame table. As you can see, there are no latitude and longitude columns to map the data.

 

Ddataframe

We concatenate address columns into one that is appropriate for geocoding. For example, the first address is:

Karlaplan 13,115 20,STOCKHOLM,Stockholms län, Sweden

We can join address columns in pandas like this to create an address column for the geocoding:

Once we create the address column, we can start geocoding as below code snippet.

  • #1 — We first delay our Geocoding 1 second between each address. This is convenient when you are Geocoding a large number of physical addresses as the Geocoding service provider can deny access to the service.
  • #2 — Create a df['location'] column by applying geocode we created.
  • #3 — Third, we can create latitude, longitude, and altitude as a single tuple column.
  • #4 — Finally, We split latitude, longitude, and altitude columns into three separate columns.

The above code produces a Dataframe with latitude and longitude columns that you can map with any Geographic visualisation tool of your choice. Let us look at the first few raws of our DataFrame, but first, we will clean out the unwanted columns.

df = df.drop([‘Address1’, ‘Address3’, ‘Address4’, ‘Address5’,’Telefon’, ‘ADDRESS’, ‘location’, ‘point’], axis=1)df.head()
 

cleaned table with latitude and longitude

I will use Folium to map out the points we created but feel free to use any other Geovisualization tool of your choice. First, we display the locations as a circle map with Folium. The map produced below shows the geocoded addresses as circles.

 

Map

Or if you prefer a dark background with an aggregated cluster of points, you can do the following: 

Below is a dark background map with Clustered points map in Folium.

 

Clustered map

Conclusion

Geocoding is a critical task in many location tasks that require coordinate systems. In this article, we have seen how to do geocoding in Python. There are a lot of other services that provide either free or paid geocoding services that you can experiment within GeoPy. I find Google Maps geocoding services more powerful than the Openstreetmap services we have used in this tutorial, but it requires an API key.

To interact and experiment with this tutorial without any installation, I created a Binder. Go this GitHub repository and click on launch binder.

shakasom/geocoding
You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or…
github.com
 

Or directly to the Jupyter notebook Binder link here:

GitHub: shakasom/geocoding/master
Click to run this interactive environment. From the Binder Project: Reproducible, sharable, interactive computing…
mybinder.org
 

Credits: Abdishakur