Skip to main content

Command Palette

Search for a command to run...

Day 6: Codematic's Hackathon with Google Cloud

Geocoding Integration and Efficiency Focus

Published
4 min read
Day 6: Codematic's Hackathon with Google Cloud
F

Fortune Precious is a software engineer skilled in TypeScript, C++, Python, Rust, Java and C#. He has interests in products, data and research.

Day 6 was a demanding, but highly rewarding deep dive into the Google Maps Platform integration and establishing the initial geolocation dataset. Success hinged on careful resource management and solving persistent data structuring puzzles.

Lessons Learned

Handling Unstructured Data Tables

  • Roadblock: I faced recurring issues in the PDF processing service when tables lacked a clear "State" column, forcing me to programmatically assign a state. Additionally, identifying the true header row (the "leader") was difficult as the table structure varied slightly across documents.

  • Solution: I implemented a more robust algorithmic check that analyzes the first few rows for keywords (like 'STATE' or 'FEEDER') to dynamically confirm the header. For tables missing the State column, I figured out how to use the disco_states argument passed to the function to insert the first listed state as a new column for every row. This ensures data integrity before the geocoding stage.

The True Cost of "Free" APIs

  • Roadblock: Initially, I interpreted the Google Maps Platform (GMP) free tier as a simple 10,000 requests per API.

  • Solution: I learned the free tier is actually a $200 USD monthly credit applied across all GMP APIs. This realization shifted my strategy: rather than testing widely, I chose to focus development solely on Lagos State (due to its high economic relevance), which provides a high-value, contained dataset that minimizes test consumption against the limited credit. This targeted approach is essential for conserving resources during development.

Daily Accomplishments

  • Google Maps Platform Setup: Successfully set up the Google Maps Platform project, created a dedicated Google Map API Key, and confirmed the usage/billing structure (the $200 monthly credit).

  • Targeted Data Generation Strategy: Implemented a new data processing flow focusing only on Lagos State to conserve API requests.

  • Geocode Generation Function: Created a reliable Python function that performs in-app API calls using the requests module to generate latitude, longitude, and bounds for each feeder point.

  • Address Formatting: Validated the optimal address string format for the Geocoding API by testing on Google's provided map form. The format street/town, state, country was adopted for building the clean address used in the geocoding request.

  • Initial Data Success: Successfully tested the coordinate generation process on 100 rows of Lagos data. The generated coordinates, bounds, and clean address strings were successfully created and stored in the database.

Tips for Fellow Hackers

  • Pre-validate Geocoding Strings: Do not guess your address format! Before writing a single line of API call code, use the Google Maps Geocoding demo/sandbox page to test various address string permutations. This instantly tells you what format yields the highest accuracy for your specific geographic data (street/town, state, country works best for my Nigerian data).

  • Be Smart with API Credits: If your hackathon uses paid external APIs with credits (like GMP), ALWAYS develop against a small, representative subset of your data (e.g., one state or one business unit). This lets you iterate quickly without burning through your quota on failed development tests.

  • Artifact Registry: Think of it as a Private Docker Hub: If you're new to Google Cloud Artifact Registry, think of it as a specialized, private repository for your software assets, including Docker images. It provides secure, fast storage right within Google Cloud, eliminating the need to use a public repository (like Docker Hub) when deploying to Cloud Run. It’s the standard, secure way to manage your containers on GCP.

Next Steps: Containerization and Deployment

The immediate focus is to transition the application from local development to a scalable, cloud-native environment, allowing full-scale testing of the Lagos data.

  • Containerization: Containerize the FastAPI application using Docker. This involves creating a Dockerfile that packages the application code, dependencies, and necessary configurations.

  • Artifact Management (Artifact Registry): Use Google Cloud Artifact Registry to securely store and manage the final Docker image. This acts as a private, highly integrated registry for all your container images, ensuring they are readily available for deployment to other Google Cloud services like Cloud Run.

  • Deployment (Cloud Run): Deploy the containerized application to Google Cloud Run. Cloud Run is a fully managed compute platform that allows you to run stateless containers via web requests. It will automatically scale to handle the bulk geocoding tasks and then scale down to zero when idle, optimizing costs.

  • Full-Scale Testing: Run the geocoding service for all remaining feeder rows in Lagos within the stable Cloud Run environment. This addresses the network issues experienced during local testing and allows for reliable data processing.