API Service System¶
The Service subsystem in LionAGI provides classes and utilities for API endpoint management (e.g., chat completions), rate-limiting, and orchestrating requests to external language model services. This system helps unify how LionAGI communicates with various model providers (OpenAI, Anthropic, Groq, etc.) while handling token usage, endpoint configuration, caching, and concurrency constraints.
1. EndpointConfig & EndPoint¶
- class lionagi.service.endpoints.base.EndpointConfig¶
Inherits from:
pydantic.BaseModel
Describes the essential attributes of an API endpoint:
provider
(str|None): The backend provider name (e.g., “openai”).base_url
(str|None): Base URL for requests.endpoint
(str): Endpoint path (like “/v1/chat/completions”).method
(Literal[“get”,”post”,”put”,”delete”]): HTTP method.openai_compatible
(bool): If True, uses an OpenAI-like calling convention.is_invokeable
(bool): If True, supports direct invocation.is_streamable
(bool): If True, can stream partial results.requires_tokens
(bool): If True, the system calculates tokens prior to the call.Other fields: -
required_kwargs
,optional_kwargs
,deprecated_kwargs
-api_version
,allowed_roles
- etc.
Key Methods:
create_payload(...)()
-> dict: Accepts request parameters and merges them with the endpoint’s config (like required_kwargs) to create a final payload + headers.invoke(payload, headers, is_cached=False, **kwargs)()
: Handles the actual or cached request._invoke(...)()
(abstract): The core HTTP request logic (subclasses must implement)._stream(...)()
(abstract): Streaming request logic if the endpoint is streamable.calculate_tokens(payload)()
: If requires_tokens=True, usesTokenCalculator
to estimate usage.
Concrete Implementations:
- ChatCompletionEndPoint
in
lionagi.service.endpoints.chat_completion.ChatCompletionEndPoint
Additional provider-specific classes (OpenAI, Anthropic, etc.).
2. TokenCalculator¶
- class lionagi.service.token_calculator.TokenCalculator¶
Methods to estimate token usage for text or images:
calculate_message_tokens(messages, **kwargs)()
-> int: Sums up tokens for each message in a chat scenario.calcualte_embed_token(inputs, **kwargs)()
-> int: Summation for embedding calls.tokenize(...)()
-> int|list[int]: Generic method to tokenize a string using tiktoken, returning either token count or the token IDs themselves.
3. APICalling (Event)¶
- class lionagi.service.endpoints.base.APICalling¶
Inherits from:
Event
An event class representing a single API call. Stores:
payload
(dict): Data to send in the request.headers
(dict): Additional HTTP headers.endpoint
(EndPoint
): The endpoint to call.is_cached
(bool): Whether this call uses caching.should_invoke_endpoint
(bool): If False, no actual invocation.
Implements:
invoke()
(async): The main method that performs the request, with retries if needed.stream(...)()
(async generator): If endpoint supports streaming, yields partial results.
4. ChatCompletionEndPoint & Subclasses¶
- class lionagi.service.endpoints.chat_completion.ChatCompletionEndPoint(EndPoint)¶
A base class for chat-style endpoints that expects role-based messages (“system”, “user”, “assistant”, etc.). Subclasses override _invoke() and _stream() for each provider’s specifics.
Examples (subclasses):
OpenAIChatCompletionEndPoint
AnthropicChatCompletionEndPoint
GroqChatCompletionEndPoint
OpenRouterChatCompletionEndPoint
PerplexityChatCompletionEndPoint
Each provider sets up its own config with required/optional fields, and possibly different base URLs or roles.
5. Rate-Limited Execution¶
- class lionagi.service.endpoints.rate_limited_processor.RateLimitedAPIProcessor¶
Inherits from:
Processor
A concurrency-limiting, rate-limiting processor dedicated to handling
APICalling
events. Supports:limit_requests
(#requests per interval).limit_tokens
(#tokens per interval).interval
(seconds) for refreshing or replenishing capacity.
- class lionagi.service.endpoints.rate_limited_processor.RateLimitedAPIExecutor¶
Inherits from:
Executor
Builds on the above Processor. For an iModel, it manages the queued or concurrent calls. Example: One can define a queue of max capacity 100, refreshing every 60s, limiting requests or tokens as needed.
6. iModel¶
- class lionagi.service.imodel.iModel¶
Represents a single “model interface” to a provider’s chat or completion endpoint. Holds:
endpoint
(EndPoint
): Typically aChatCompletionEndPoint
or custom.executor
(RateLimitedAPIExecutor
): A concurrency-limiting queue for calls.kwargs
: Additional default parameters (like “model” name, “api_key”, etc.).
Methods:
invoke(**kwargs) -> APICalling|None()
: Creates an APICalling from combined endpoint config + local kwargs, queues it in the executor, and awaits completion.stream(**kwargs) -> APICalling|None()
: Streams partial results if the endpoint is streamable.create_api_calling(**kwargs) -> APICalling()
: A utility to unify parameters into a final APICalling object.
Usage:
from lionagi.service.imodel import iModel
# Provide minimal config
my_model = iModel(provider="openai", base_url="https://api.openai.com/v1", model="gpt-3.5-turbo")
# -> Creates an endpoint automatically
# -> Also sets up a RateLimitedAPIExecutor
# Now we can call
result = await my_model.invoke(messages=[{"role":"user","content":"Hello!"}])
print(result.execution.response)
7. iModelManager¶
- class lionagi.service.imodel.iModelManager(Manager)¶
Maintains a dictionary of named iModel
objects:
chat
: The “chat” model if we define one as “chat” in the registry.parse
: The “parse” model for secondary tasks like extracting structured data.register_imodel(name, model)()
: Insert or update the registry with a specific name.
Example:
from lionagi.service.imodel import iModel, iModelManager
chat_mod = iModel(provider=”openai”, model=”gpt-3.5-turbo”) parse_mod = iModel(provider=”openai”, model=”text-davinci-003”)
manager = iModelManager(chat=chat_mod, parse=parse_mod) # -> manager.chat = chat_mod, manager.parse = parse_mod
Summary¶
The LionAGI Service system integrates everything needed to call external LLM services:
Endpoints for each provider (OpenAI, Anthropic, etc.).
APICalling event for tracking usage or partial streaming.
Rate-limiting structures (Processor, Executor) to handle concurrency or daily usage caps.
iModel as a top-level convenience object: one instance = one distinct provider + concurrency constraints.
iModelManager for multi-model usage in the same environment.
By configuring endpoints properly and using the RateLimitedAPIExecutor, LionAGI can handle robust multi-provider or multi-model usage while avoiding throttling or over-limit errors.