Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

RAG with Images and Video

University of Kansas School of Business

RAG can also work with visual content. Instead of embedding text, we embed images (and video frames) using CLIP — a model that maps both text and images into the same vector space.

This chapter builds:

The same retrieve-augment-generate pattern applies — just with pixels instead of paragraphs.