FastAPI-based scraper for commodity ticker prices (HO, CL, RB futures) and competitor oil pricing from NewEnglandOil. Includes cron-driven scraping, PostgreSQL storage, and REST endpoints for price retrieval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
156 lines
4.0 KiB
Markdown
156 lines
4.0 KiB
Markdown
# eamco_scraper
|
|
|
|
FastAPI microservice for scraping heating oil prices from New England Oil and storing historical pricing data.
|
|
|
|
## Overview
|
|
|
|
This service scrapes oil company pricing data from the New England Oil website (Zone 10 - Central Massachusetts) and stores it in a PostgreSQL database for historical tracking and trend analysis.
|
|
|
|
## Features
|
|
|
|
- **Web Scraping**: Automated scraping of oil prices using BeautifulSoup4
|
|
- **Historical Tracking**: Stores all price records (no updates, only inserts) for trend analysis
|
|
- **Cron-Friendly**: Single GET request triggers scrape and storage
|
|
- **Health Checks**: Built-in health endpoint for monitoring
|
|
- **Docker Ready**: Production and development Docker configurations
|
|
|
|
## API Endpoints
|
|
|
|
### `GET /health`
|
|
Health check endpoint with database connectivity status.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "healthy",
|
|
"db_connected": true
|
|
}
|
|
```
|
|
|
|
### `GET /scraper/newenglandoil/latestprice`
|
|
Trigger scrape of New England Oil Zone 10 prices, store in database, and return results.
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"message": "Successfully scraped and stored 30 prices",
|
|
"prices_scraped": 30,
|
|
"prices_stored": 30,
|
|
"scrape_timestamp": "2026-02-07T22:00:00",
|
|
"prices": [
|
|
{
|
|
"company_name": "AUBURN OIL",
|
|
"town": "Auburn",
|
|
"price_decimal": 2.599,
|
|
"scrape_date": "2026-02-07",
|
|
"zone": "zone10"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
### `company_prices` Table
|
|
|
|
| Column | Type | Description |
|
|
|--------|------|-------------|
|
|
| id | SERIAL | Primary key |
|
|
| company_name | VARCHAR(255) | Oil company name |
|
|
| town | VARCHAR(100) | Town/city |
|
|
| price_decimal | DECIMAL(6,3) | Price per gallon |
|
|
| scrape_date | DATE | Date price was listed |
|
|
| zone | VARCHAR(50) | Geographic zone (default: zone10) |
|
|
| created_at | TIMESTAMP | Record creation timestamp |
|
|
|
|
**Indexes:**
|
|
- `idx_company_prices_company` on `company_name`
|
|
- `idx_company_prices_scrape_date` on `scrape_date`
|
|
- `idx_company_prices_zone` on `zone`
|
|
- `idx_company_prices_company_date` on `(company_name, scrape_date)`
|
|
- `idx_company_prices_zone_date` on `(zone, scrape_date)`
|
|
|
|
## Development
|
|
|
|
### Local Setup
|
|
|
|
```bash
|
|
# Create virtual environment
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Copy environment file
|
|
cp .env.example .env.local
|
|
|
|
# Edit .env.local with your database credentials
|
|
|
|
# Run the application
|
|
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
|
```
|
|
|
|
### Docker Local
|
|
|
|
```bash
|
|
cd /mnt/code/oil/eamco/eamco_deploy
|
|
docker-compose -f docker-compose.local.yml up scraper_local
|
|
```
|
|
|
|
Access at: http://localhost:9619
|
|
|
|
## Production
|
|
|
|
### Docker Production
|
|
|
|
```bash
|
|
cd /mnt/code/oil/eamco/eamco_deploy
|
|
docker-compose -f docker-compose.prod.yml up -d scraper_prod
|
|
```
|
|
|
|
Access at: http://192.168.1.204:9519
|
|
|
|
## Cron Integration
|
|
|
|
Add to Unraid cron or system crontab:
|
|
|
|
```bash
|
|
# Scrape prices daily at 6 AM
|
|
0 6 * * * curl -s http://192.168.1.204:9619/scraper/newenglandoil/latestprice > /dev/null 2>&1
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| MODE | Application mode (LOCAL/PRODUCTION) | LOCAL |
|
|
| POSTGRES_USERNAME | Database username | postgres |
|
|
| POSTGRES_PW | Database password | password |
|
|
| POSTGRES_SERVER | Database server | 192.168.1.204 |
|
|
| POSTGRES_PORT | Database port | 5432 |
|
|
| POSTGRES_DBNAME | Database name | eamco |
|
|
| LOG_LEVEL | Logging level | INFO |
|
|
| SCRAPER_DELAY | Delay between requests (seconds) | 2.0 |
|
|
| SCRAPER_TIMEOUT | Request timeout (seconds) | 10 |
|
|
|
|
## Architecture
|
|
|
|
- **Framework**: FastAPI 0.109+
|
|
- **Database**: PostgreSQL 15+ with SQLAlchemy 2.0
|
|
- **Scraping**: BeautifulSoup4 + lxml + requests
|
|
- **Server**: Uvicorn with 2 workers (production)
|
|
|
|
## Ports
|
|
|
|
- **Local Development**: 9619
|
|
- **Production**: 9519
|
|
|
|
## Future Enhancements
|
|
|
|
- Frontend display on Home page (table or cards)
|
|
- Price change alerts/notifications
|
|
- Support for additional zones
|
|
- Price trend graphs and analytics
|