Edwin Eames fad7c1dc28 feat: add fallback to most recent prices when requested date has no data
- When no prices exist for the requested date, query for the most
  recent available date and return those prices instead
- Log informational message when falling back to alternate date
2026-02-27 18:41:12 -05:00

eamco_scraper

FastAPI microservice for scraping heating oil prices from New England Oil and storing historical pricing data.

Overview

This service scrapes oil company pricing data from the New England Oil website (Zone 10 - Central Massachusetts) and stores it in a PostgreSQL database for historical tracking and trend analysis.

Features

  • Web Scraping: Automated scraping of oil prices using BeautifulSoup4
  • Historical Tracking: Stores all price records (no updates, only inserts) for trend analysis
  • Cron-Friendly: Single GET request triggers scrape and storage
  • Health Checks: Built-in health endpoint for monitoring
  • Docker Ready: Production and development Docker configurations

API Endpoints

GET /health

Health check endpoint with database connectivity status.

Response:

{
  "status": "healthy",
  "db_connected": true
}

GET /scraper/newenglandoil/latestprice

Trigger scrape of New England Oil Zone 10 prices, store in database, and return results.

Response:

{
  "status": "success",
  "message": "Successfully scraped and stored 30 prices",
  "prices_scraped": 30,
  "prices_stored": 30,
  "scrape_timestamp": "2026-02-07T22:00:00",
  "prices": [
    {
      "company_name": "AUBURN OIL",
      "town": "Auburn",
      "price_decimal": 2.599,
      "scrape_date": "2026-02-07",
      "zone": "zone10"
    }
  ]
}

Database Schema

company_prices Table

Column Type Description
id SERIAL Primary key
company_name VARCHAR(255) Oil company name
town VARCHAR(100) Town/city
price_decimal DECIMAL(6,3) Price per gallon
scrape_date DATE Date price was listed
zone VARCHAR(50) Geographic zone (default: zone10)
created_at TIMESTAMP Record creation timestamp

Indexes:

  • idx_company_prices_company on company_name
  • idx_company_prices_scrape_date on scrape_date
  • idx_company_prices_zone on zone
  • idx_company_prices_company_date on (company_name, scrape_date)
  • idx_company_prices_zone_date on (zone, scrape_date)

Development

Local Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Copy environment file
cp .env.example .env.local

# Edit .env.local with your database credentials

# Run the application
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Docker Local

cd /mnt/code/oil/eamco/eamco_deploy
docker-compose -f docker-compose.local.yml up scraper_local

Access at: http://localhost:9619

Production

Docker Production

cd /mnt/code/oil/eamco/eamco_deploy
docker-compose -f docker-compose.prod.yml up -d scraper_prod

Access at: http://192.168.1.204:9519

Cron Integration

Add to Unraid cron or system crontab:

# Scrape prices daily at 6 AM
0 6 * * * curl -s http://192.168.1.204:9619/scraper/newenglandoil/latestprice > /dev/null 2>&1

Environment Variables

Variable Description Default
MODE Application mode (LOCAL/PRODUCTION) LOCAL
POSTGRES_USERNAME Database username postgres
POSTGRES_PW Database password password
POSTGRES_SERVER Database server 192.168.1.204
POSTGRES_PORT Database port 5432
POSTGRES_DBNAME Database name eamco
LOG_LEVEL Logging level INFO
SCRAPER_DELAY Delay between requests (seconds) 2.0
SCRAPER_TIMEOUT Request timeout (seconds) 10

Architecture

  • Framework: FastAPI 0.109+
  • Database: PostgreSQL 15+ with SQLAlchemy 2.0
  • Scraping: BeautifulSoup4 + lxml + requests
  • Server: Uvicorn with 2 workers (production)

Ports

  • Local Development: 9619
  • Production: 9519

Future Enhancements

  • Frontend display on Home page (table or cards)
  • Price change alerts/notifications
  • Support for additional zones
  • Price trend graphs and analytics
Description
No description provided
Readme 47 KiB
Languages
Python 94.9%
Shell 5.1%