feat: initial commit for oil price scraper service
FastAPI-based scraper for commodity ticker prices (HO, CL, RB futures) and competitor oil pricing from NewEnglandOil. Includes cron-driven scraping, PostgreSQL storage, and REST endpoints for price retrieval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
155
README.md
Normal file
155
README.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# eamco_scraper
|
||||
|
||||
FastAPI microservice for scraping heating oil prices from New England Oil and storing historical pricing data.
|
||||
|
||||
## Overview
|
||||
|
||||
This service scrapes oil company pricing data from the New England Oil website (Zone 10 - Central Massachusetts) and stores it in a PostgreSQL database for historical tracking and trend analysis.
|
||||
|
||||
## Features
|
||||
|
||||
- **Web Scraping**: Automated scraping of oil prices using BeautifulSoup4
|
||||
- **Historical Tracking**: Stores all price records (no updates, only inserts) for trend analysis
|
||||
- **Cron-Friendly**: Single GET request triggers scrape and storage
|
||||
- **Health Checks**: Built-in health endpoint for monitoring
|
||||
- **Docker Ready**: Production and development Docker configurations
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### `GET /health`
|
||||
Health check endpoint with database connectivity status.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"db_connected": true
|
||||
}
|
||||
```
|
||||
|
||||
### `GET /scraper/newenglandoil/latestprice`
|
||||
Trigger scrape of New England Oil Zone 10 prices, store in database, and return results.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "success",
|
||||
"message": "Successfully scraped and stored 30 prices",
|
||||
"prices_scraped": 30,
|
||||
"prices_stored": 30,
|
||||
"scrape_timestamp": "2026-02-07T22:00:00",
|
||||
"prices": [
|
||||
{
|
||||
"company_name": "AUBURN OIL",
|
||||
"town": "Auburn",
|
||||
"price_decimal": 2.599,
|
||||
"scrape_date": "2026-02-07",
|
||||
"zone": "zone10"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
### `company_prices` Table
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | SERIAL | Primary key |
|
||||
| company_name | VARCHAR(255) | Oil company name |
|
||||
| town | VARCHAR(100) | Town/city |
|
||||
| price_decimal | DECIMAL(6,3) | Price per gallon |
|
||||
| scrape_date | DATE | Date price was listed |
|
||||
| zone | VARCHAR(50) | Geographic zone (default: zone10) |
|
||||
| created_at | TIMESTAMP | Record creation timestamp |
|
||||
|
||||
**Indexes:**
|
||||
- `idx_company_prices_company` on `company_name`
|
||||
- `idx_company_prices_scrape_date` on `scrape_date`
|
||||
- `idx_company_prices_zone` on `zone`
|
||||
- `idx_company_prices_company_date` on `(company_name, scrape_date)`
|
||||
- `idx_company_prices_zone_date` on `(zone, scrape_date)`
|
||||
|
||||
## Development
|
||||
|
||||
### Local Setup
|
||||
|
||||
```bash
|
||||
# Create virtual environment
|
||||
python -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Copy environment file
|
||||
cp .env.example .env.local
|
||||
|
||||
# Edit .env.local with your database credentials
|
||||
|
||||
# Run the application
|
||||
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
### Docker Local
|
||||
|
||||
```bash
|
||||
cd /mnt/code/oil/eamco/eamco_deploy
|
||||
docker-compose -f docker-compose.local.yml up scraper_local
|
||||
```
|
||||
|
||||
Access at: http://localhost:9619
|
||||
|
||||
## Production
|
||||
|
||||
### Docker Production
|
||||
|
||||
```bash
|
||||
cd /mnt/code/oil/eamco/eamco_deploy
|
||||
docker-compose -f docker-compose.prod.yml up -d scraper_prod
|
||||
```
|
||||
|
||||
Access at: http://192.168.1.204:9519
|
||||
|
||||
## Cron Integration
|
||||
|
||||
Add to Unraid cron or system crontab:
|
||||
|
||||
```bash
|
||||
# Scrape prices daily at 6 AM
|
||||
0 6 * * * curl -s http://192.168.1.204:9619/scraper/newenglandoil/latestprice > /dev/null 2>&1
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| MODE | Application mode (LOCAL/PRODUCTION) | LOCAL |
|
||||
| POSTGRES_USERNAME | Database username | postgres |
|
||||
| POSTGRES_PW | Database password | password |
|
||||
| POSTGRES_SERVER | Database server | 192.168.1.204 |
|
||||
| POSTGRES_PORT | Database port | 5432 |
|
||||
| POSTGRES_DBNAME | Database name | eamco |
|
||||
| LOG_LEVEL | Logging level | INFO |
|
||||
| SCRAPER_DELAY | Delay between requests (seconds) | 2.0 |
|
||||
| SCRAPER_TIMEOUT | Request timeout (seconds) | 10 |
|
||||
|
||||
## Architecture
|
||||
|
||||
- **Framework**: FastAPI 0.109+
|
||||
- **Database**: PostgreSQL 15+ with SQLAlchemy 2.0
|
||||
- **Scraping**: BeautifulSoup4 + lxml + requests
|
||||
- **Server**: Uvicorn with 2 workers (production)
|
||||
|
||||
## Ports
|
||||
|
||||
- **Local Development**: 9619
|
||||
- **Production**: 9519
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- Frontend display on Home page (table or cards)
|
||||
- Price change alerts/notifications
|
||||
- Support for additional zones
|
||||
- Price trend graphs and analytics
|
||||
Reference in New Issue
Block a user