Overview

This tutorial demonstrates how to:

  • Upload and process videos using Cloudinary integration
  • Extract and analyze video content
  • Add overlays and transformations
  • Process video subtitles and speaker information

Task Structure

Let’s break down the task into its core components:

1. Input Schema

First, we define what inputs our task expects:

input_schema:
  type: object
  properties:
    upload_file:
      type: string
      description: The url of the file to upload
    public_id:
      type: string
      description: The public id of the file to upload
    transformation_prompt:
      type: string
      description: The prompt for the transformations to apply to the file
    subtitle_vtt:
      type: string
      description: The vtt file content to add subtitles to the video

This schema specifies that our task expects:

  • A video file URL
  • A public ID for the video
  • A transformation prompt describing desired changes
  • VTT subtitle content (optional)

2. Tools Configuration

Next, we define the external tools our task will use:

- name: cloudinary_upload
  type: integration
  integration:
    provider: cloudinary
    method: media_upload
    setup:
      cloudinary_api_key: YOUR_CLOUDINARY_API_KEY
      cloudinary_api_secret: YOUR_CLOUDINARY_API_SECRET
      cloudinary_cloud_name: YOUR_CLOUDINARY_CLOUD_NAME

- name: cloudinary_edit
  type: integration
  integration:
    provider: cloudinary
    method: media_edit

- name: ffmpeg_edit
  type: integration
  integration:
    provider: ffmpeg

We’re using three main integrations:

  • Cloudinary for video uploads and transformations
  • FFmpeg for additional video processing capabilities

3. Main Workflow Steps

1

Initial Video Upload

- tool: cloudinary_upload
arguments:
  file: $ steps[0].input.video_url
  public_id: $ steps[0].input.public_id
  upload_params:
    resource_type: video

This step:

  • Takes the input video URL
  • Uploads it to Cloudinary
  • Specifies the resource type as video
2

Create Video Preview

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation:
        - start_offset: 0
          end_offset: 30

This step:

  • Creates a 30-second preview of the video
  • Useful for quick analysis and processing
3

Analyze Video Content

- prompt:
  - role: user
    content: 
      - type: image_url
        image_url:
          url: trimmed_video_url
      - type: text
        text: |-
          Which speakers are speaking in the video? And where does each of them sit?

This step:

  • Analyzes the video content
  • Identifies speakers and their positions
  • Uses VTT subtitles for additional context
4

Generate Speaker Transformations

- evaluate:
    speakers_transformations: |-
      $ [
      transform
      for speaker in _.speakers_json
        for transform in [
        {
          "overlay": {"font_family": "Arial", "font_size": 32, "text": speaker.speaker},
          "color": "white"
        },
        {
          "duration": 5,
          "flags": "layer_apply",
          "gravity": "south_east" if speaker.position == "right" else "south_west",
          "start_offset": speaker.timestamps[0].start,
          "y": 80,
          "x": 80
        }
        ]
      ]

This step:

  • Creates transformations for each speaker
  • Adds speaker labels with proper positioning
  • Sets timing for each overlay
5

Apply Transformations

- tool: cloudinary_upload
  arguments:
    file: $ steps[0].input.upload_file
    public_id: $ steps[0].input.public_id
    upload_params:
      resource_type: video
      transformation: _.speakers_transformations

This step:

  • Uses the Cloudinary upload tool to apply the generated transformations
  • Processes the video with speaker labels and positioning
  • Returns a URL to the transformed video with all overlays applied

Usage

Here’s how to use this task with the Julep SDK:

import time
import yaml
from julep import Client

# Initialize the client
client = Client(api_key=JULEP_API_KEY)

transformation_prompt = """
1- I want to add an overlay an the following image to the video, and apply a layer apply flag also. Here's the image url:
https://res.cloudinary.com/demo/image/upload/logos/cloudinary_icon_white.png

2- I also want you to to blur the video, and add a fade in and fade out effect to the video with a duration of 3 seconds each.
"""
# Create the agent
agent = client.agents.create(
  name="Julep Video Processing Agent",
  description="A Julep agent that can process and analyze videos using Cloudinary and FFmpeg.",
)

# Load the task definition
with open('video_processing_task.yaml', 'r') as file:
  task_definition = yaml.safe_load(file)

# Create the task
task = client.tasks.create(
  agent_id=agent.id,
  **task_definition
)

# Create the execution
execution = client.executions.create(
    task_id=task.id,
    input={
        "video_url":  "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerMeltdowns.mp4",
        "public_id": "video_test",
        "transformation_prompt": transformation_prompt,
    }
)
# Wait for the execution to complete
while (result := client.executions.get(execution.id)).status not in ['succeeded', 'failed']:
    print(result.status)
    time.sleep(1)

# Print the result
if result.status == "succeeded":
    print(result.output)
else:
    print(f"Error: {result.error}")

Example Output

This is an example output when the task is run over the sample video input.

Next Steps

Try this task yourself, check out the full example, see the video-processing-with-natural-language cookbook.