API Reference

`Dataset`

Enum of datasets the platform can fetch and store. Exported from the package root.

from marketgoblin import Dataset

Dataset.OHLCV      # "ohlcv"
Dataset.SHARES     # "shares"
Dataset.DIVIDENDS  # "dividends"

Dataset is a StrEnum, so members serialize directly to path segments and JSON.

Member	Columns	Providers
`Dataset.OHLCV`	`date` (int32 YYYYMMDD), `open` / `high` / `low` / `close` (float32), `volume` (int64), `is_adjusted` (bool), `symbol`	`yahoo`, `tiingo`
`Dataset.SHARES`	`date` (int32 YYYYMMDD), `shares` (int64), `symbol`	`yahoo`, `tiingo`
`Dataset.DIVIDENDS`	`date` (int32 YYYYMMDD), `dividend` (float32), `symbol`	`yahoo`, `tiingo`

OHLCV is returned as a tidy stacked frame: each trading day appears twice — one row with is_adjusted=True (split/dividend-adjusted prices) and one with is_adjusted=False (raw prices). Filter downstream to pick a variant:

adjusted_only = goblin.fetch("AAPL", start, end).filter(pl.col("is_adjusted"))
raw_only      = goblin.fetch("AAPL", start, end).filter(~pl.col("is_adjusted"))

Requesting a dataset a provider does not support (e.g. Dataset.SPLITS from YahooSource) raises ValueError at the dispatch layer — no silent fallbacks.

`MarketGoblin`

The main entry point for all data operations.

from marketgoblin import Dataset, MarketGoblin

goblin = MarketGoblin(provider="yahoo", save_path="./data")

Constructor

MarketGoblin(
    provider: str,
    api_key: str | None = None,
    save_path: str | Path | None = None,
    **source_kwargs,
)

Parameter	Description
`provider`	Data source name: `"yahoo"` or `"tiingo"`
`api_key`	API key for providers that require one (Tiingo requires one; Yahoo does not)
`save_path`	Root directory for disk persistence. Required for `load()`.
`**source_kwargs`	Extra keyword arguments forwarded to the source constructor

Properties

`supported_datasets`

supported_datasets: frozenset[Dataset]

Datasets that the configured provider can fetch. "yahoo" supports {Dataset.OHLCV, Dataset.SHARES, Dataset.DIVIDENDS}; "tiingo" additionally supports Dataset.SPLITS, Dataset.FUNDAMENTALS_DAILY, and Dataset.FUNDAMENTALS_STATEMENTS.

Methods

`fetch()`

fetch(
    symbol: str,
    start: str,
    end: str,
    dataset: Dataset = Dataset.OHLCV,
    parse_dates: bool = False,
) -> pl.LazyFrame

Downloads data for the requested dataset. If save_path is set, persists monthly Parquet slices to disk and returns data loaded back from disk.

`load()`

load(
    symbol: str,
    start: str,
    end: str,
    dataset: Dataset = Dataset.OHLCV,
    parse_dates: bool = False,
) -> pl.LazyFrame

Loads previously saved data from disk. Raises RuntimeError if save_path was not set.

`fetch_many()`

fetch_many(
    symbols: list[str],
    start: str,
    end: str,
    dataset: Dataset = Dataset.OHLCV,
    parse_dates: bool = False,
    max_workers: int = 8,
    requests_per_second: float = 2.0,
) -> dict[str, pl.LazyFrame]

Batch fetch using a ThreadPoolExecutor. Failed symbols are logged and excluded — they never crash the batch. Rate-limited to requests_per_second.

`BaseSource`

Abstract base class for data sources. Concrete sources declare which datasets they support by returning a dispatch table from _build_dispatch(); the base class handles lookup and error reporting.

import polars as pl

from marketgoblin import Dataset
from marketgoblin.sources.base import BaseSource, Fetcher

class MySource(BaseSource):
    name = "mysource"

    def _build_dispatch(self) -> dict[Dataset, Fetcher]:
        return {Dataset.OHLCV: self._fetch_ohlcv}

    def _fetch_ohlcv(self, symbol: str, start: str, end: str) -> pl.LazyFrame:
        ...

The Fetcher signature is (symbol, start, end) -> pl.LazyFrame. OHLCV fetchers are expected to return a tidy stacked frame with both is_adjusted=True and is_adjusted=False rows.

Register in goblin.py:

_SOURCES = {"yahoo": YahooSource, "tiingo": TiingoSource, "mysource": MySource}

`YahooSource`

Backed by yfinance. Supports Dataset.OHLCV, Dataset.SHARES, and Dataset.DIVIDENDS.

OHLCV: one yf.Ticker(symbol).history(auto_adjust=False, actions=False) call returns raw OHLC + Adj Close + Volume. Adjusted Open/High/Low are derived locally via ratio = Adj Close / Close; adjusted Close == Adj Close; Volume is identical across variants. Output is a tidy stacked frame with is_adjusted=True/False rows, sorted by (date, is_adjusted). This matches yfinance's own auto_adjust=True output exactly while halving network load.
SHARES: yf.Ticker(symbol).get_shares_full(start, end) — sparse, corporate-action-driven; deduplicated to one row per day (last value wins).
DIVIDENDS: yf.Ticker(symbol).dividends — full history, then filtered to the requested [start, end] range.

Transient failures are retried with exponential backoff (3 attempts, 1 s / 2 s delays). Empty-data ValueErrors propagate immediately.

`TiingoSource`

Backed by the official tiingo Python client. Supports Dataset.OHLCV, Dataset.SHARES, and Dataset.DIVIDENDS. Requires an API key (paid tier needed for SHARES and fetch_classification).

The key can be supplied three ways, in order of precedence:

Explicit api_key= argument to MarketGoblin(...).
TIINGO_API_KEY environment variable.
A TIINGO_API_KEY=... line in a local .env file. marketgoblin auto-loads .env from the working directory or any parent at package import time. See .env.example at the repo root for the template.

# Either of these works once .env contains TIINGO_API_KEY:
goblin = MarketGoblin(provider="tiingo", save_path="./data")
goblin = MarketGoblin(provider="tiingo", api_key="<TIINGO_API_KEY>", save_path="./data")

OHLCV: one client.get_ticker_price(...) call returns each trading day's raw OHLCV plus adjusted variants (adjOpen, adjHigh, adjLow, adjClose, adjVolume) and a divCash field. The two variants are split into the project's tidy stacked frame (one row per (date, is_adjusted)).
SHARES: Tiingo's daily Fundamentals endpoint exposes marketCap but no shares field. We join client.get_ticker_price(...) and client.get_fundamentals_daily(...) on date and derive shares = round(marketCap / close) — one row per trading day.
DIVIDENDS: derived from the same prices endpoint as OHLCV — rows with divCash > 0 become dividend events.
Metadata: client.get_ticker_metadata(...) for name/exchange/description/first-trade-date; client.get_fundamentals_daily(...) (last 7 days) adds marketCap and peRatio; one extra client.get_ticker_price(...) lookup for the latest close lets us derive shares_outstanding. fetch_metadata(symbol, fast=True) skips both paid lookups.
Classification: direct GET against /tiingo/fundamentals/meta (paid; not wrapped by the Python client). Sector / industry strings are mapped to slugs (e.g. "Information Technology" → "information-technology"); constituent fields (top_companies, etf_symbol) stay at their dataclass defaults — Tiingo doesn't expose them.

Transient failures are retried with exponential backoff (3 attempts, 1 s / 2 s delays). Empty-data ValueErrors propagate immediately. The Tiingo REST API expects lowercase tickers in URL/query params; on-disk symbol columns are uppercase, matching the rest of the platform.

`DiskStorage`

Handles Parquet persistence. Used internally by MarketGoblin when save_path is set. Path scheme is uniform across datasets:

{base_path}/{provider}/{dataset}/{SYMBOL}/{SYMBOL}_{YYYY-MM}.pq

OHLCV no longer has an adjusted/raw variant segment — both variants live in the same parquet file, distinguished by the is_adjusted column. Each Parquet file has a JSON sidecar at the same path with .json extension. Writes are atomic via .tmp rename for both .pq and .json.

OHLCV sidecar

Key	Description
`row_count`	Number of rows in this slice (stacked OHLCV has 2 rows per date)
`unique_days`	Distinct trading days in the slice
`start_date` / `end_date`	First and last date as YYYYMMDD
`expected_trading_days`	Weekday count for the month
`missing_days`	Weekdays not present in the data (likely holidays)
`close_min` / `close_max`	Close price range
`volume_min` / `volume_max`	Volume range
`has_adjusted` / `has_raw`	Whether each variant is present in the slice
`currency`	Price currency (default `"USD"`)
`downloaded_at`	ISO 8601 timestamp
`file_size_bytes`	Size of the `.pq` file

SHARES sidecar

Key	Description
`row_count`	Number of rows in this slice
`start_date` / `end_date`	First and last date as YYYYMMDD
`shares_min` / `shares_max`	Share-count range
`downloaded_at`	ISO 8601 timestamp
`file_size_bytes`	Size of the `.pq` file

No missing-days analysis for SHARES — cadence is corporate-action-driven and irregular, so absence of a date is not a signal.

DIVIDENDS sidecar

Key	Description
`row_count`	Number of dividend events in this slice
`start_date` / `end_date`	First and last event date as YYYYMMDD
`dividend_min` / `dividend_max` / `dividend_total`	Per-share payout range and slice total
`currency`	Dividend currency (default `"USD"`)
`downloaded_at`	ISO 8601 timestamp
`file_size_bytes`	Size of the `.pq` file

No missing-days analysis for DIVIDENDS — events are discrete and irregular.