# eamco_scraper FastAPI microservice for scraping heating oil prices from New England Oil and storing historical pricing data. ## Overview This service scrapes oil company pricing data from the New England Oil website (Zone 10 - Central Massachusetts) and stores it in a PostgreSQL database for historical tracking and trend analysis. ## Features - **Web Scraping**: Automated scraping of oil prices using BeautifulSoup4 - **Historical Tracking**: Stores all price records (no updates, only inserts) for trend analysis - **Cron-Friendly**: Single GET request triggers scrape and storage - **Health Checks**: Built-in health endpoint for monitoring - **Docker Ready**: Production and development Docker configurations ## API Endpoints ### `GET /health` Health check endpoint with database connectivity status. **Response:** ```json { "status": "healthy", "db_connected": true } ``` ### `GET /scraper/newenglandoil/latestprice` Trigger scrape of New England Oil Zone 10 prices, store in database, and return results. **Response:** ```json { "status": "success", "message": "Successfully scraped and stored 30 prices", "prices_scraped": 30, "prices_stored": 30, "scrape_timestamp": "2026-02-07T22:00:00", "prices": [ { "company_name": "AUBURN OIL", "town": "Auburn", "price_decimal": 2.599, "scrape_date": "2026-02-07", "zone": "zone10" } ] } ``` ## Database Schema ### `company_prices` Table | Column | Type | Description | |--------|------|-------------| | id | SERIAL | Primary key | | company_name | VARCHAR(255) | Oil company name | | town | VARCHAR(100) | Town/city | | price_decimal | DECIMAL(6,3) | Price per gallon | | scrape_date | DATE | Date price was listed | | zone | VARCHAR(50) | Geographic zone (default: zone10) | | created_at | TIMESTAMP | Record creation timestamp | **Indexes:** - `idx_company_prices_company` on `company_name` - `idx_company_prices_scrape_date` on `scrape_date` - `idx_company_prices_zone` on `zone` - `idx_company_prices_company_date` on `(company_name, scrape_date)` - `idx_company_prices_zone_date` on `(zone, scrape_date)` ## Development ### Local Setup ```bash # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Copy environment file cp .env.example .env.local # Edit .env.local with your database credentials # Run the application uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 ``` ### Docker Local ```bash cd /mnt/code/oil/eamco/eamco_deploy docker-compose -f docker-compose.local.yml up scraper_local ``` Access at: http://localhost:9619 ## Production ### Docker Production ```bash cd /mnt/code/oil/eamco/eamco_deploy docker-compose -f docker-compose.prod.yml up -d scraper_prod ``` Access at: http://192.168.1.204:9519 ## Cron Integration Add to Unraid cron or system crontab: ```bash # Scrape prices daily at 6 AM 0 6 * * * curl -s http://192.168.1.204:9619/scraper/newenglandoil/latestprice > /dev/null 2>&1 ``` ## Environment Variables | Variable | Description | Default | |----------|-------------|---------| | MODE | Application mode (LOCAL/PRODUCTION) | LOCAL | | POSTGRES_USERNAME | Database username | postgres | | POSTGRES_PW | Database password | password | | POSTGRES_SERVER | Database server | 192.168.1.204 | | POSTGRES_PORT | Database port | 5432 | | POSTGRES_DBNAME | Database name | eamco | | LOG_LEVEL | Logging level | INFO | | SCRAPER_DELAY | Delay between requests (seconds) | 2.0 | | SCRAPER_TIMEOUT | Request timeout (seconds) | 10 | ## Architecture - **Framework**: FastAPI 0.109+ - **Database**: PostgreSQL 15+ with SQLAlchemy 2.0 - **Scraping**: BeautifulSoup4 + lxml + requests - **Server**: Uvicorn with 2 workers (production) ## Ports - **Local Development**: 9619 - **Production**: 9519 ## Future Enhancements - Frontend display on Home page (table or cards) - Price change alerts/notifications - Support for additional zones - Price trend graphs and analytics