LLM Providers
The provider abstraction layer and supported LLM backends.
AutoFlow uses a provider abstraction so all agents can work with any supported LLM backend. Providers are registered in src/autoflow/llm/registry.py and implement the BaseLLMClient interface.
Provider Interface
All providers implement a single method:
class BaseLLMClient:
def complete(
self,
system_prompt: str,
user_prompt: str,
temperature: float = 0.7,
max_tokens: int = 2000,
model: str | None = None,
) -> dict:
"""Returns {"content": str, "model": str, "usage": dict}"""Supported Providers
| Provider | Module | Requires |
|---|---|---|
mock | mock_client.py | Nothing (default) |
openai | openai_client.py | OPENAI_API_KEY |
anthropic | anthropic_client.py | ANTHROPIC_API_KEY |
gemini | gemini_client.py | GOOGLE_API_KEY |
ollama | ollama_client.py | Local Ollama server |
Provider Selection
from autoflow.llm.registry import get_client
client = get_client("openai") # Returns OpenAIClient instance
client = get_client("mock") # Returns MockClient instanceOr via CLI:
python -m autoflow "Your request" --provider anthropicOllama Provider
The Ollama provider connects to a local Ollama server and supports automatic model discovery.
Model Selection
When using --provider ollama, AutoFlow will:
- Query your local Ollama server for installed models
- Validate the requested model (via
--model) or the default (llama3) - Prompt you to choose from available models if the default isn't installed
- Suggest install commands if no models are found
# Use a specific installed model
python -m autoflow "Classify emails" --provider ollama --model llama3.2:latest
# Base name works too (resolves to llama3.2:latest)
python -m autoflow "Classify emails" --provider ollama --model llama3.2# If the default model (llama3) is installed, it's used automatically
python -m autoflow "Classify emails" --provider ollama
# Using Ollama model: llama3:latest# If the default isn't installed, you'll be prompted:
python -m autoflow "Classify emails" --provider ollama
# Default Ollama model 'llama3' is not installed locally.
#
# Available Ollama models:
# 1. llama3.2:latest
# 2. mistral-small:24b-instruct-2501-q8_0
# 3. bge-m3:latest
#
# Select a model [1-3] (or press Enter for 1):If Ollama is not running or no models are installed, you'll see a helpful message with install instructions:
WARNING: Could not connect to Ollama or no models installed.
Make sure Ollama is running (ollama serve) and pull a model:
ollama pull llama3.2Programmatic Usage
from autoflow.llm.ollama_client import OllamaClient
client = OllamaClient(model="llama3.2:latest")
# List locally available models
models = client.list_models()
# ["llama3.2:latest", "mistral-small:24b-instruct-2501-q8_0", ...]
# Validate a model name
client.validate_model("llama3.2") # True
client.validate_model("gpt-4") # False
# Resolve to a full model name
client.resolve_model("llama3.2") # "llama3.2:latest"Environment Variables
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL | http://localhost:11434 | Ollama server address |
Mock Provider
The MockClient is the default provider and is used for all testing. It returns deterministic canned responses by detecting the current pipeline stage from the system prompt (persona text).
Key behaviors:
- Stage detection — Matches keywords in the persona text to determine which stage is running
- Iteration tracking — The evaluator returns
FAILon first call andPASSon subsequent calls, simulating the critique loop - Generate mode — Returns a 6-node workflow with
decision_treeandvariable_transformnodes - Deterministic — Same inputs always produce the same outputs
This makes it possible to run the full test suite (287 tests) without any API keys or external services.
Model Recommendations
The GeneratorAgent enriches prompts with model recommendations based on the detected workflow pattern. These are defined in src/autoflow/models.py:
| Task Type | Recommended Model | Temperature | Max Tokens |
|---|---|---|---|
| Classification | gpt-4o-mini | 0.0 | 500 |
| Extraction | gpt-4o-mini | 0.1 | 500 |
| Analysis | gpt-4o | 0.3 | 2000 |
| Generation | gpt-4o | 0.5 | 4000 |
| Evaluation | gpt-4o | 0.1 | 1000 |
| Formatting | gpt-4o-mini | 0.1 | 500 |
| Creative | gpt-4o | 0.7 | 2000 |
| Summarization | gpt-4o-mini | 0.2 | 500 |
These recommendations are hints included in the generated workflow's LLM node configurations, not the model used to run the pipeline itself.