Files
eamco_scraper/README.md
Edwin Eames af9c2f99e7 feat: initial commit for oil price scraper service
FastAPI-based scraper for commodity ticker prices (HO, CL, RB futures)
and competitor oil pricing from NewEnglandOil. Includes cron-driven
scraping, PostgreSQL storage, and REST endpoints for price retrieval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 17:57:44 -05:00

156 lines
4.0 KiB
Markdown

# eamco_scraper
FastAPI microservice for scraping heating oil prices from New England Oil and storing historical pricing data.
## Overview
This service scrapes oil company pricing data from the New England Oil website (Zone 10 - Central Massachusetts) and stores it in a PostgreSQL database for historical tracking and trend analysis.
## Features
- **Web Scraping**: Automated scraping of oil prices using BeautifulSoup4
- **Historical Tracking**: Stores all price records (no updates, only inserts) for trend analysis
- **Cron-Friendly**: Single GET request triggers scrape and storage
- **Health Checks**: Built-in health endpoint for monitoring
- **Docker Ready**: Production and development Docker configurations
## API Endpoints
### `GET /health`
Health check endpoint with database connectivity status.
**Response:**
```json
{
"status": "healthy",
"db_connected": true
}
```
### `GET /scraper/newenglandoil/latestprice`
Trigger scrape of New England Oil Zone 10 prices, store in database, and return results.
**Response:**
```json
{
"status": "success",
"message": "Successfully scraped and stored 30 prices",
"prices_scraped": 30,
"prices_stored": 30,
"scrape_timestamp": "2026-02-07T22:00:00",
"prices": [
{
"company_name": "AUBURN OIL",
"town": "Auburn",
"price_decimal": 2.599,
"scrape_date": "2026-02-07",
"zone": "zone10"
}
]
}
```
## Database Schema
### `company_prices` Table
| Column | Type | Description |
|--------|------|-------------|
| id | SERIAL | Primary key |
| company_name | VARCHAR(255) | Oil company name |
| town | VARCHAR(100) | Town/city |
| price_decimal | DECIMAL(6,3) | Price per gallon |
| scrape_date | DATE | Date price was listed |
| zone | VARCHAR(50) | Geographic zone (default: zone10) |
| created_at | TIMESTAMP | Record creation timestamp |
**Indexes:**
- `idx_company_prices_company` on `company_name`
- `idx_company_prices_scrape_date` on `scrape_date`
- `idx_company_prices_zone` on `zone`
- `idx_company_prices_company_date` on `(company_name, scrape_date)`
- `idx_company_prices_zone_date` on `(zone, scrape_date)`
## Development
### Local Setup
```bash
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy environment file
cp .env.example .env.local
# Edit .env.local with your database credentials
# Run the application
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
```
### Docker Local
```bash
cd /mnt/code/oil/eamco/eamco_deploy
docker-compose -f docker-compose.local.yml up scraper_local
```
Access at: http://localhost:9619
## Production
### Docker Production
```bash
cd /mnt/code/oil/eamco/eamco_deploy
docker-compose -f docker-compose.prod.yml up -d scraper_prod
```
Access at: http://192.168.1.204:9519
## Cron Integration
Add to Unraid cron or system crontab:
```bash
# Scrape prices daily at 6 AM
0 6 * * * curl -s http://192.168.1.204:9619/scraper/newenglandoil/latestprice > /dev/null 2>&1
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| MODE | Application mode (LOCAL/PRODUCTION) | LOCAL |
| POSTGRES_USERNAME | Database username | postgres |
| POSTGRES_PW | Database password | password |
| POSTGRES_SERVER | Database server | 192.168.1.204 |
| POSTGRES_PORT | Database port | 5432 |
| POSTGRES_DBNAME | Database name | eamco |
| LOG_LEVEL | Logging level | INFO |
| SCRAPER_DELAY | Delay between requests (seconds) | 2.0 |
| SCRAPER_TIMEOUT | Request timeout (seconds) | 10 |
## Architecture
- **Framework**: FastAPI 0.109+
- **Database**: PostgreSQL 15+ with SQLAlchemy 2.0
- **Scraping**: BeautifulSoup4 + lxml + requests
- **Server**: Uvicorn with 2 workers (production)
## Ports
- **Local Development**: 9619
- **Production**: 9519
## Future Enhancements
- Frontend display on Home page (table or cards)
- Price change alerts/notifications
- Support for additional zones
- Price trend graphs and analytics