Skip to main content

Overview

Julep leverages LiteLLM to seamlessly connect you to a wide array of Language Models (LLMs). This integration offers incredible flexibility, allowing you to tap into models from various providers with a straightforward, unified interface.
With our unified API, switching between different providers is a breeze, ensuring you maintain consistent functionality across the board.

Available Models

While we provide API keys for quick testing and development, you’ll need to use your own API keys when deploying to production. This ensures you have full control over your usage and billing.
Looking for top-notch quality? Our curated selection of models delivers excellent outputs for all your use cases.

Anthropic

Here are the Anthropic models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
claude-3-haiku200K tokens4K tokensBudget
claude-3-sonnet200K tokens4K tokensPremium
claude-3.5-haiku200K tokens8K tokensStandard
claude-3.5-sonnet200K tokens8K tokensPremium
claude-3.5-sonnet-20240620200K tokens4K tokensPremium
claude-3.5-sonnet-20241022200K tokens8K tokensPremium
claude-3.7-sonnet200K tokens8K tokensPremium
claude-opus-4200K tokens32K tokensEnterprise
claude-opus-4-1200K tokens32K tokensEnterprise
claude-sonnet-41M tokens64K tokensPremium
claude-sonnet-4-5200K tokens64K tokensPremium

Google

Here are the Google models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
gemini-1.5-pro2M tokens8K tokensStandard
gemini-1.5-pro-latest1M tokens8K tokensPremium
gemini-2.0-flash1M tokens8K tokensBudget
gemini-2.5-flash1M tokens65K tokensBudget
gemini-2.5-pro1M tokens65K tokensStandard
gemini-2.5-pro-preview-03-251M tokens65K tokensStandard
gemini-2.5-pro-preview-06-051M tokens65K tokensStandard

OpenAI

Here are the OpenAI models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
gpt-4-turbo128K tokens4K tokensEnterprise
gpt-4.11M tokens32K tokensPremium
gpt-4.1-mini1M tokens32K tokensBudget
gpt-4.1-nano1M tokens32K tokensBudget
gpt-4o128K tokens16K tokensPremium
gpt-4o-mini128K tokens16K tokensBudget
gpt-5272K tokens128K tokensStandard
gpt-5-2025-08-07272K tokens128K tokensStandard
gpt-5-chat272K tokens128K tokensStandard
gpt-5-chat-latest128K tokens16K tokensStandard
gpt-5-mini272K tokens128K tokensBudget
gpt-5-mini-2025-08-07272K tokens128K tokensBudget
gpt-5-nano272K tokens128K tokensBudget
gpt-5-nano-2025-08-07272K tokens128K tokensBudget
o1200K tokens100K tokensEnterprise
o1-mini128K tokens65K tokensStandard
o1-preview128K tokens32K tokensEnterprise
o3-mini200K tokens100K tokensStandard
o4-mini200K tokens100K tokensStandard

Groq

Here are the Groq models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
deepseek-r1-distill-llama-70b128K tokens128K tokensStandard
gemma2-9b-it8K tokens8K tokensBudget
llama-3.1-8b128K tokens8K tokensBudget
llama-3.1-8b-instant128K tokens8K tokensBudget
llama-3.3-70b-versatile128K tokens32K tokensStandard
meta-llama/Llama-Guard-4-12B163K tokens163K tokensBudget
meta-llama/llama-4-maverick-17b-128e-instruct131K tokens8K tokensBudget
meta-llama/llama-4-scout-17b-16e-instruct131K tokens8K tokensBudget
qwen/qwen3-32b131K tokens131K tokensBudget

OpenRouter

Here are the OpenRouter models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
deepseek-chat131K tokens8K tokensStandard
deepseek/deepseek-r1-distill-llama-70b65K tokens8K tokensStandard
deepseek/deepseek-r1-distill-qwen-32b65K tokens8K tokensStandard
eva-llama-3.33-70bUnknownUnknownUnknown
eva-qwen-2.5-72bUnknownUnknownUnknown
hermes-3-llama-3.1-70bUnknownUnknownUnknown
l3.1-euryale-70b200K tokens100K tokensEnterprise
l3.3-euryale-70b200K tokens100K tokensEnterprise
magnum-v4-72bUnknownUnknownUnknown
meta-llama/llama-3.1-8b-instructUnknownUnknownUnknown
meta-llama/llama-3.3-70b-instructUnknownUnknownUnknown
meta-llama/llama-4-scout131K tokens8K tokensBudget
mistral-large-2411128K tokens128K tokensPremium
openrouter/meta-llama/llama-4-maverick131K tokens8K tokensBudget
openrouter/meta-llama/llama-4-maverick:freeUnknownUnknownUnknown
openrouter/meta-llama/llama-4-scout131K tokens8K tokensBudget
openrouter/meta-llama/llama-4-scout:freeUnknownUnknownUnknown
perplexity/sonar128K tokensUnknownStandard
perplexity/sonar-deep-research128K tokensUnknownPremium
perplexity/sonar-pro200K tokens8K tokensPremium
perplexity/sonar-reasoning128K tokensUnknownStandard
perplexity/sonar-reasoning-pro128K tokensUnknownPremium
qwen-2.5-72b-instructUnknownUnknownUnknown

Amazon Nova

Here are the Amazon Nova models supported by Julep:
Model NameContext WindowMax OutputTool CallingVisionAudioCachingCost Tier
amazon/nova-lite-v1UnknownUnknownUnknown
amazon/nova-micro-v1UnknownUnknownUnknown
amazon/nova-pro-v1UnknownUnknownUnknown

Embedding

Here are the embedding models supported by Julep:
Model NameEmbedding Dimensions
Alibaba-NLP/gte-large-en-v1.51024
BAAI/bge-m31024
text-embedding-3-large1024
vertex_ai/text-embedding-0041024
voyage-31024
voyage-multilingual-21024
Though the models mentioned above support different embedding dimensions, Julep uses fixed 1024 dimensions for all embedding models for now. We plan to support different dimensions in the future.

Supported Parameters

Following are a list of different parameters that can be used to control the behavior of the models.

Core Parameters

ParameterRangeDescription
temperature0.0 - 5.0Controls randomness in outputs. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused and deterministic
top_p0.0 - 1.0Alternative to temperature for nucleus sampling. Only tokens with cumulative probability < top_p are considered. We recommend adjusting either this or temperature, not both
max_tokens≥ 1Maximum number of tokens to generate in the response

Penalty Parameters

ParameterRangeDescription
frequency_penalty-2.0 - 2.0Penalizes tokens based on their frequency in the text. Positive values decrease repetition
presence_penalty-2.0 - 2.0Penalizes tokens based on their presence in the text. Positive values decrease likelihood of repeating content
repetition_penalty0.0 - 2.0Penalizes repetition (1.0 is neutral). Values > 1.0 reduce likelihood of repeating content
length_penalty0.0 - 2.0Penalizes based on generation length (1.0 is neutral). Values > 1.0 penalize longer generations

Advanced Controls

ParameterRangeDescription
min_p0.0 - 1.0Minimum probability threshold compared to the highest token probability
seedintegerFor deterministic generation. Set a specific seed for reproducible results
stoplist[str]Up to 4 sequences where generation should stop
response_formatobjectControl output format: {"type": "json_object"} or {"type": "json_schema", "json_schema": {...}}
Not all parameters are supported by every model. Please refer to the LiteLLM documentation for more details.Response Format Support: The response_format parameter is supported by OpenAI, Azure OpenAI, Google AI Studio (Gemini), Vertex AI, Bedrock, Anthropic, Groq, xAI (Grok-2+), Databricks, and Ollama. For the most up-to-date list, check the LiteLLM JSON Mode documentation.
Best Practices:
  • Start with default values and adjust based on your needs
  • Use temperature (0.0 - 1.0) for most cases
  • Avoid setting multiple penalty parameters simultaneously
  • Test different combinations for optimal results
Setting extreme values for multiple parameters may lead to unexpected behavior or poor quality outputs.

Usage Guidelines

Consider Model Selection Criteria

  • 1. Your budget and cost constraints
  • 2. How fast you need responses
  • 3. The quality you’re aiming for
  • 4. The context window size you require

Follow Best Practices

  • 1. Start with smaller models for development and testing
  • 2. Use larger context windows only when necessary
  • 3. Keep an eye on token usage to manage costs
For more information, please refer to the LiteLLM documentation.
I