In this section, we’ll cover the key concepts and components of the Julep Responses API. The Julep Responses API is designed to be compatible with OpenAI’s interface, making it easy to migrate existing applications that use OpenAI’s API to Julep.
The Open Responses API requires self-hosting. See the installation guide below.
Being in Alpha, the API is subject to change. Check back frequently for updates.
While Sessions provide a persistent, stateful way to interact with agents over multiple turns, the Responses API offers a lightweight, stateless alternative for quick, one-off interactions with language models. Here’s how they compare:
Feature
Sessions
Responses
State Management
Maintains conversation history
Stateless (with optional context from previous responses)
Persistence
Long-lived, for ongoing conversations
Short-lived, for one-off interactions
Agent Integration
Requires an agent
No agent needed
Setup Complexity
Requires agent and session creation
Minimal setup (just model and input)
Use Case
Multi-turn conversations, complex interactions
Quick content generation, processing, or reasoning
If you need to maintain context across multiple interactions but prefer the simplicity of the Responses API, you can use the previous_response_id parameter to link responses together.
The Response object is the core data structure returned by the Julep Responses API as a response to a request. It contains all the information about a generated response. It follows the OpenAI Responses API. Following is the schema of the Response object:
Field
Type
Description
id
string
Unique identifier for the response
object
string
Always “response”
created_at
integer
Unix timestamp when the response was created
status
string
Current status: “completed”, “failed”, “in_progress”, or “incomplete”
error
object or null
Error information if the response failed
incomplete_details
object or null
Details about why a response is incomplete
instructions
string or null
Optional instructions provided to the model
max_output_tokens
integer or null
Maximum number of tokens to generate
model
string
The model used to generate the response
output
array
List of output items (messages, tool calls, reasoning)
parallel_tool_calls
boolean
Whether tools can be called in parallel
previous_response_id
string or null
ID of a previous response for context
reasoning
object or null
Reasoning steps if reasoning was requested
store
boolean
Whether the response is stored for later retrieval
temperature
number
Sampling temperature used (0-1)
text
object or null
Text formatting options
tool_choice
string or object
How tools are selected (“auto”, “none”, “required”)
tools
array
List of tools available to the model
top_p
number
Top-p sampling parameter (0-1)
truncation
string
Truncation strategy (“disabled” or “auto”)
usage
object
Token usage statistics
user
string or null
Optional user identifier
metadata
object
Custom metadata associated with the response
The output array contains the actual content generated by the model, which can include text messages, tool calls (function, web search, file search, computer), and reasoning items.