Multi-Modality
Langfuse supports rendering Multi-Modal traces, including both text and image formats. We follow OpenAI’s format convention.
How to trace Multi-Modal content in Langfuse?
To utilize our Multi-Modal Trace support, your trace or observation input
/output
should include a list of messages comprising the conversation so far. Each message should contain a role
(system, user, assistant) and content
. To display multi-modal content, you can pass a combination of text and image URLs. The content
property of the messages follows the OpenAI convention.
We plan to extend support to base64 images, file attachments (e.g., PDFs), and audio soon. Please add your upvote and any thoughts to this thread.
Visual Representation in Langfuse
When the “Markdown” option is enabled in the Langfuse UI, you can click on the image icon to preview the image inline.
Content Format
Content | Type | Description |
---|---|---|
Default: Text content | string | The text contents of the message. |
Multi-Modal: Array of content parts | array | An array of content parts with a defined type, each can be of type text or image_url . You can pass multiple images by adding multiple image_url content parts. |
Content Examples
{
"content": [
{
"role": "system",
"content": "You are an AI trained to describe and interpret images. Describe the main objects and actions in the image."
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's happening in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}
Content Part Types
Property | Type | Description |
---|---|---|
type | text | Type of content part |
text | string | Text content of the message |
For more details and examples, refer to our OpenAI cookbook.