Files
eamco_address_checker/README.md

143 lines
6.3 KiB
Markdown

# EAMCO Address Checker
**eamco_address_checker** is a robust and resilient FastAPI microservice designed for batch verification and correction of customer addresses. It leverages the Nominatim geocoding service and an intelligent, ReAct-inspired agent to ensure address data quality.
The service is designed to be triggered as a scheduled job (e.g., via cron) to process addresses in batches, making it ideal for maintaining data hygiene in large databases without disrupting real-time operations.
[![Language](https://img.shields.io/badge/Language-Python%203.11-blue)](https://www.python.org/)
[![Framework](https://img.shields.io/badge/Framework-FastAPI-green)](https://fastapi.tiangolo.com/)
[![Database](https://img.shields.io/badge/Database-PostgreSQL-blue)](https://www.postgresql.org/)
---
## Core Features
- **Batch Address Verification**: Geocodes customer addresses in configurable batches to find their precise latitude and longitude.
- **Fuzzy Matching Correction**: If an address fails to geocode, the agent attempts to correct misspellings by comparing it against a local database of known street names for that town.
- **Street Data Population**: Includes an endpoint to fetch and store all street names for a given town/state from OpenStreetMap, building the reference data needed for corrections.
- **Resilient Agentic Workflow**: An `AddressVerificationAgent` follows a Think-Act-Observe cycle, ensuring that failures on individual records do not halt the entire batch.
- **Rate Limiting**: Automatically respects the Nominatim API's rate limits (1 request/second) to prevent service blockage.
- **Environment-based Configuration**: Easily configured for different environments (development, production) using `.env` files.
- **Containerized**: Comes with Dockerfiles for easy deployment.
## How It Works
The service operates through a simple but powerful workflow:
1. **Trigger**: A `POST` request to `/verify-addresses` kicks off a batch run.
2. **Plan**: The agent queries the database for records that have not been verified or were marked as incorrect.
3. **Execute**: For each address, the agent performs the following steps:
a. **Attempt Geocoding**: Tries to get a location from Nominatim.
b. **Fuzzy Match on Failure**: If the initial attempt fails, it uses `rapidfuzz` to find the closest matching street name from the `street_reference` table.
c. **Retry Geocoding**: If a confident match is found, it retries geocoding with the corrected address.
d. **Update Record**: The database record is updated with the latitude/longitude and a `correct_address` flag.
4. **Reflect**: The service returns a detailed summary of the batch run, including how many addresses were processed, updated, corrected, or failed.
---
## API Endpoints
### Health
- `GET /health`
- **Description**: Checks the service status and database connectivity.
- **Response**: `{"status": "healthy", "db_connected": true}`
### Verification
- `POST /verify-addresses`
- **Description**: Triggers a synchronous batch job to verify a new set of addresses. The batch size is determined by the `BATCH_SIZE` environment variable.
- **Response**: A detailed JSON object with statistics of the batch run.
- `POST /reset-verifications`
- **Description**: **Use with caution.** This endpoint resets the verification status (`correct_address`, `verified_at`, etc.) for all customer records, making them eligible for re-verification.
- **Response**: A confirmation with the number of records reset.
### Street Data
- `POST /streets/{town}/{state}`
- **Description**: Fetches all named streets for a given town and state from the OpenStreetMap Overpass API and stores them in the `street_reference` table. This is essential for the fuzzy matching feature.
- **Example**: `curl -X POST http://localhost:8000/streets/Boston/MA`
- **Response**: A summary of streets added or updated.
- `GET /streets/{town}/{state}`
- **Description**: Returns the number of reference streets stored locally for a given town and state.
- **Example**: `curl http://localhost:8000/streets/Boston/MA`
- **Response**: A JSON object with the street count.
---
## Getting Started
### Prerequisites
- Python 3.10+
- PostgreSQL database
- An OpenStreetMap Nominatim user agent (set in your configuration)
### Installation
1. **Clone the repository:**
```bash
git clone http://192.168.1.204:3017/Eamco/eamco_address_checker.git
cd eamco_address_checker
```
2. **Create a virtual environment and install dependencies:**
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
3. **Configure your environment:**
- Copy `.env.example` to `.env.local`:
```bash
cp .env.example .env.local
```
- Edit `.env.local` with your database credentials and other settings. Pay special attention to the `POSTGRES_DBNAME` and `CURRENT_SETTINGS` variables.
### Running the Service
#### For Development
You can run the service directly with Uvicorn, which provides live reloading.
```bash
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
The API will be available at `http://localhost:8000`, and interactive documentation can be found at `http://localhost:8000/docs`.
#### Using Docker
The project includes Dockerfiles for containerized deployment.
- **Build the image:**
```bash
docker build -t eamco_address_checker .
```
- **Run the container:**
```bash
docker run -d -p 8000:8000 --env-file .env.local --name address-checker eamco_address_checker
```
---
## Project Structure
```
eamco_address_checker/
├── app/
│ ├── __init__.py
│ ├── agent.py # The core ReAct-style verification agent
│ ├── config.py # Application configuration from environment variables
│ ├── main.py # FastAPI application, endpoints, and startup logic
│ ├── models.py # SQLAlchemy ORM models
│ ├── streets.py # Logic for fetching and correcting street names
│ └── tools.py # Modular tools used by the agent (geocoding, validation, etc.)
├── .env.example # Example environment variables
├── Dockerfile # Dockerfile for production builds
├── requirements.txt # Python dependencies
└── README.md # This file
```