Concepts - Julep

Overview

In this section, we’ll cover the key concepts and components of the Julep Responses API. The Julep Responses API is designed to be compatible with OpenAI’s interface, making it easy to migrate existing applications that use OpenAI’s API to Julep.

The Open Responses API requires self-hosting. See the installation guide below.
Being in Alpha, the API is subject to change. Check back frequently for updates.
For more context, see the OpenAI Responses API documentation.

Components

The Responses API offers a streamlined way to interact with language models with the following key components:

Response ID: A unique identifier (uuid7) for each response.
Model: The language model used to generate the response (e.g., “claude-3.5-haiku”, “gpt-4o”, etc.).
Input: The prompt or question sent to the model, which can be simple text or structured input.
Output: The generated content from the model, which can include text, tool outputs, or other structured data.
Status: The current status of the response (completed, failed, in_progress, incomplete).
Tools: Optional tools that the model can use to enhance its response.
Usage: Token consumption metrics for the response.

2.1. Response Configuration Options

When creating a response, you can leverage these configuration options to tailor the experience:

Option	Type	Description	Default	Status
`model`	`string`	The language model to use (e.g., “claude-3.5-haiku”, “gpt-4o”). Check out the supported models for more information.	Required	Implemented
`input`	`string` \| `array`	The prompt or structured input to send to the model	Required	Implemented
`include`	`array` \| `null`	Types of content to include in the response (e.g., “file_search_call.results”)	`None`	Partially Implemented
`parallel_tool_calls`	`boolean`	Whether to allow tools to be called in parallel	`true`	Implemented
`store`	`boolean`	Whether to store the response for later retrieval	`true`	Implemented
`stream`	`boolean`	Whether to stream the response as it’s generated	`false`	Planned
`max_tokens`	`integer` \| `null`	Maximum number of tokens to generate	`None`	Implemented
`temperature`	`number`	Controls randomness in response generation (0 to 1)	`1`	Implemented
`top_p`	`number`	Controls diversity in token selection (0 to 1)	`1`	Implemented
`n`	`integer` \| `null`	Number of responses to generate	`None`	Implemented
`stop`	`string` \| `array` \| `null`	Sequence(s) where the model should stop generating	`None`	Implemented
`presence_penalty`	`number` \| `null`	Penalty for new tokens based on presence in text so far	`None`	Implemented
`frequency_penalty`	`number` \| `null`	Penalty for new tokens based on frequency in text so far	`None`	Implemented
`logit_bias`	`object` \| `null`	Modify likelihood of specific tokens appearing	`None`	Implemented
`user`	`string` \| `null`	Unique identifier for the end-user	`None`	Implemented
`instructions`	`string` \| `null`	Additional instructions to guide the model’s response	`None`	Implemented
`previous_response_id`	`string` \| `null`	ID of a previous response for context continuity	`None`	Implemented
`reasoning`	`object` \| `null`	Controls reasoning effort (low/medium/high)	`None`	Implemented
`text`	`object` \| `null`	Configures text format (text or JSON object)	`None`	Implemented
`tool_choice`	`"auto"` \| `"none"` \| `object` \| `null`	Controls how the model chooses which tools to use	`None`	Implemented
`tools`	`array` \| `null`	List of tools the model can use for generating the response	`None`	Partially Implemented
`truncation`	`"disabled"` \| `"auto"` \| `null`	How to handle context overflow	`None`	Planned
`metadata`	`object` \| `null`	Additional metadata for the response	`None`	Implemented

To know more about the roadmap of the Responses API, check out the Roadmap page.

Input Formats

The Responses API supports various input formats to accommodate different use cases:

Simple Text Input

The simplest way to interact with the Responses API is to provide a text string as input:

{
  "input": "What are the top 5 skincare products?"
}

Structured Message Input

For more complex interactions, you can provide a structured array of messages:

{
  "input": [
    {
      "role": "user",
      "content": "Please summarize the current market trends in renewable energy."
    }
  ]
}

The Responses API supports multi-modal inputs, allowing you to include images or files along with text:

{
  "input": [
    {
      "role": "user",
      "content": [
          {"type": "input_text", "text": "what is in this image?"},
          {
              "type": "input_image",
              "image_url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
          }
      ]
    }
  ]
}

Tool Usage

The Responses API supports tool usage, allowing the model to perform actions like web searches, function calls, and more to enhance its response.

{
  "input": "What are the latest advancements in quantum computing?",
  "tools": [
    {
      "type": "web_search_preview",
      "domains": ["https://www.google.com"],
      "search_context_size": "small",
      "user_location": { 
        "type": "approximate",
        "city": "YOUR_CITY",
        "country": "YOUR_COUNTRY",
        "region": "YOUR_REGION",
        "timezone": "YOUR_TIMEZONE"
      }
    }
  ],
}

Relationship to Sessions

While Sessions provide a persistent, stateful way to interact with agents over multiple turns, the Responses API offers a lightweight, stateless alternative for quick, one-off interactions with language models. Here’s how they compare:

Feature	Sessions	Responses
State Management	Maintains conversation history	Stateless (with optional context from previous responses)
Persistence	Long-lived, for ongoing conversations	Short-lived, for one-off interactions
Agent Integration	Requires an agent	No agent needed
Setup Complexity	Requires agent and session creation	Minimal setup (just model and input)
Use Case	Multi-turn conversations, complex interactions	Quick content generation, processing, or reasoning

If you need to maintain context across multiple interactions but prefer the simplicity of the Responses API, you can use the previous_response_id parameter to link responses together.

Response Object Structure

The Response object is the core data structure returned by the Julep Responses API as a response to a request. It contains all the information about a generated response. It follows the OpenAI Responses API. Following is the schema of the Response object:

Field	Type	Description
`id`	string	Unique identifier for the response
`object`	string	Always “response”
`created_at`	integer	Unix timestamp when the response was created
`status`	string	Current status: “completed”, “failed”, “in_progress”, or “incomplete”
`error`	object or null	Error information if the response failed
`incomplete_details`	object or null	Details about why a response is incomplete
`instructions`	string or null	Optional instructions provided to the model
`max_output_tokens`	integer or null	Maximum number of tokens to generate
`model`	string	The model used to generate the response
`output`	array	List of output items (messages, tool calls, reasoning)
`parallel_tool_calls`	boolean	Whether tools can be called in parallel
`previous_response_id`	string or null	ID of a previous response for context
`reasoning`	object or null	Reasoning steps if reasoning was requested
`store`	boolean	Whether the response is stored for later retrieval
`temperature`	number	Sampling temperature used (0-1)
`text`	object or null	Text formatting options
`tool_choice`	string or object	How tools are selected (“auto”, “none”, “required”)
`tools`	array	List of tools available to the model
`top_p`	number	Top-p sampling parameter (0-1)
`truncation`	string	Truncation strategy (“disabled” or “auto”)
`usage`	object	Token usage statistics
`user`	string or null	Optional user identifier
`metadata`	object	Custom metadata associated with the response

The output array contains the actual content generated by the model, which can include text messages, tool calls (function, web search, file search, computer), and reasoning items.

Best Practices

Optimize Input Prompts

1. Be Specific: Clearly define what you want the model to generate.
2. Provide Context: Include relevant background information in your prompt.
3. Use Examples: When appropriate, include examples of desired outputs in your prompt.

Model Selection

1. Match Complexity: Use more capable models for complex tasks (e.g., reasoning, coding).
2. Consider Latency: Smaller models are faster for simple tasks.
3. Test Different Models: Compare results across models for optimal performance.

Tool Usage

1. Provide Clear Tool Descriptions: Help the model understand when and how to use tools.
2. Only Include Relevant Tools: Too many tools can confuse the model’s selection process.
3. Validate Tool Outputs: Always verify the information returned from tool calls.

Next Steps

Learn more about the Responses API Examples - To learn how to use the Responses API with code examples
Learn more about the Julep Sessions - To explore Julep’s stateful conversations when you need ongoing context
Learn more about the Open Responses API Roadmap - To learn more about the Open Responses API Roadmap
Learn more about Julep - To learn more about Julep and its features
GitHub - To contribute to the project

​Overview

​Components

​2.1. Response Configuration Options

​Input Formats

​Simple Text Input

​Structured Message Input

​Multi-modal Input

​Tool Usage

​Relationship to Sessions

​Response Object Structure

​Best Practices

Optimize Input Prompts

Model Selection

Tool Usage

​Next Steps

Overview

Components

2.1. Response Configuration Options

Input Formats

Simple Text Input

Structured Message Input

Multi-modal Input

Tool Usage

Relationship to Sessions

Response Object Structure

Best Practices

Next Steps