Supported Models

Overview

Julep leverages LiteLLM to seamlessly connect you to a wide array of Language Models (LLMs). This integration offers incredible flexibility, allowing you to tap into models from various providers with a straightforward, unified interface.

With our unified API, switching between different providers is a breeze, ensuring you maintain consistent functionality across the board.

Available Models

While we provide API keys for quick testing and development, you’ll need to use your own API keys when deploying to production. This ensures you have full control over your usage and billing.

Looking for top-notch quality? Our curated selection of models delivers excellent outputs for all your use cases.

Anthropic

Here are the Anthropic models supported by Julep:

Model Name	Context Window	Best For
claude-3-opus	200K tokens	Complex reasoning, analysis
claude-3-sonnet	200K tokens	General purpose tasks
claude-3-haiku	200K tokens	Quick responses
claude-3.5-haiku	200K tokens	Improved reasoning
claude-3.5-sonnet	200K tokens	Improved reasoning
claude-3.5-sonnet-20240620	200K tokens	Enhanced reasoning capabilities
claude-3.5-sonnet-20241022	200K tokens	Computer Use Capabilities and one of the latest models
claude-3.7-sonnet	200K tokens	Reasoning abilities and the latest model from Anthropic

Google

Here are the Google models supported by Julep:

Model Name	Context Window	Best For
gemini-1.5-pro	1M tokens	Complex tasks
gemini-1.5-pro-latest	1M tokens	Cutting-edge performance
gemini-2.0-flash	1M tokens	Next generation features, speed, thinking, realtime streaming, and multimodal generation
gemini-2.5-pro-preview-03-25	1M tokens	Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more

OpenAI

Here are the OpenAI models supported by Julep:

Model Name	Context Window	Best For
gpt-4-turbo	128K tokens	Advanced reasoning
gpt-4o-mini	128K tokens	Balanced performance
gpt-4o	128K tokens	Balanced performance
o1-mini	200K tokens	Quick tasks
o1-preview	200K tokens	Testing features
o1	200K tokens	General tasks
o3-mini	200K tokens	Suited for reasoning tasks

Groq

Here are the Groq models supported by Julep:

Model Name	Context Window	Best For
llama-3.1-70b	8K tokens	Long-form content
llama-3.1-8b	8K tokens	Quick processing

OpenRouter

Here are the OpenRouter models supported by Julep:

Model Name	Context Window	Best For
mistral-large-2411	128K tokens	High performance
qwen-2.5-72b-instruct	131K tokens	Complex instructions
eva-llama-3.33-70b	128K tokens	Story writing and creative fiction
l3.1-euryale-70b	128K tokens	Poetry and artistic writing
l3.3-euryale-70b	128K tokens	Advanced creative writing and roleplay
magnum-v4-72b	8K tokens	Content generation and brainstorming
eva-qwen-2.5-72b	8K tokens	Creative problem solving and ideation
hermes-3-llama-3.1-70b	8K tokens	Narrative design and worldbuilding
deepseek-chat	32K tokens	Conversational AI
meta-llama/llama-4-scout	10M tokens	General purpose tasks, long-context processing, code analysis
meta-llama/llama-4-scout:free	10M tokens	Free tier usage, same capabilities as llama-4-scout
meta-llama/llama-4-maverick	10M tokens	Advanced reasoning, coding, multilingual tasks, image understanding
meta-llama/llama-4-maverick:free	10M tokens	Free tier usage, same capabilities as llama-4-maverick

Cerebras

Here are the Cerebras models supported by Julep:

Model Name	Context Window	Best For
cerebras/llama-3.1-8b	8K tokens	Quick creative writing and basic text generation
cerebras/llama-3.3-70b	8K tokens	Complex creative writing, storytelling, and detailed content generation
llama-4-scout-17b-16e-instruct	10M tokens	General purpose tasks, long-context processing, code analysis (Much faster than other deployements at over 2,600 token/sec)

Embedding

Here are the embedding models supported by Julep:

Model Name	Embedding Dimensions	Best For
text-embedding-3-large	1024	High-quality vectors
voyage-multilingual-2	1024	Cross-language tasks
voyage-3	1024	Advanced embeddings
Alibaba-NLP/gte-large-en-v1.5	1024	Cost-effective solutions
BAAI/bge-m3	1024	Cost-effective solutions
vertex_ai/text-embedding-004	1024	Google Cloud integration

Though the models mention above support different embedding dimensions, Julep uses fixed 1024 dimensions for all embedding models for now. We plan to support different dimensions in the future.

Supported Parameters

Following are a list of different paramters that can be used to control the behavior of the models.

Parameter	Range	Description
temperature	0.0 - 5.0	Controls randomness in outputs. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused and deterministic
top_p	0.0 - 1.0	Alternative to temperature for nucleus sampling. Only tokens with cumulative probability < top_p are considered. We recommend adjusting either this or temperature, not both
max_tokens	≥ 1	Maximum number of tokens to generate in the response

Parameter	Range	Description
frequency_penalty	-2.0 - 2.0	Penalizes tokens based on their frequency in the text. Positive values decrease repetition
presence_penalty	-2.0 - 2.0	Penalizes tokens based on their presence in the text. Positive values decrease likelihood of repeating content
repetition_penalty	0.0 - 2.0	Penalizes repetition (1.0 is neutral). Values > 1.0 reduce likelihood of repeating content
length_penalty	0.0 - 2.0	Penalizes based on generation length (1.0 is neutral). Values > 1.0 penalize longer generations

Parameter	Range	Description
min_p	0.0 - 1.0	Minimum probability threshold compared to the highest token probability
seed	integer	For deterministic generation. Set a specific seed for reproducible results
stop	list[str]	Up to 4 sequences where generation should stop

Not all parameters are supported by every model. Please refer to the LiteLLM documentation for more details.

Best Practices:

Start with default values and adjust based on your needs
Use temperature (0.0 - 1.0) for most cases
Avoid setting multiple penalty parameters simultaneously
Test different combinations for optimal results

Setting extreme values for multiple parameters may lead to unexpected behavior or poor quality outputs.

Usage Guidelines

Consider Model Selection Criteria

1. Your budget and cost constraints
2. How fast you need responses
3. The quality you’re aiming for
4. The context window size you require

Follow Best Practices

1. Start with smaller models for development and testing
2. Use larger context windows only when necessary
3. Keep an eye on token usage to manage costs

For more information, please refer to the LiteLLM documentation.

agents

sessions

users

files

docs

tasks

executions

temporal

jobs

healthz

Supported Models

Overview

Available Models

Anthropic

Google

OpenAI

Groq

OpenRouter

Cerebras

Embedding

Supported Parameters

Usage Guidelines

Consider Model Selection Criteria

Follow Best Practices

agents

sessions

users

files

docs

tasks

executions

temporal

jobs

healthz

​Overview

​Available Models

​Anthropic

​Google

​OpenAI

​Groq

​OpenRouter

​Cerebras

​Embedding

​Supported Parameters

​Usage Guidelines

Consider Model Selection Criteria

Follow Best Practices

Overview

Available Models

Anthropic

Google

OpenAI

Groq

OpenRouter

Cerebras

Embedding

Supported Parameters

Usage Guidelines