Matyas.
ServicesProjectsExperienceBlogContact
CSGet in touch
Back to Dictionary
ai

Multimodal AI

Multimodal AI refers to models that can process and generate multiple types of data — such as text, images, audio, and video — within a single system. Models like GPT-4o and Claude can accept both text and image inputs, enabling use cases like visual question answering, document analysis, and UI understanding. This convergence is blurring the lines between previously separate AI disciplines.

#ai

Related Terms

Large Language Model

A large language model (LLM) is a deep learning model trained on massive text datasets to understand and generate human-like text. LLMs like GPT, Claude, and LLaMA power chatbots, code assistants, and content generation tools. They work by predicting the next token in a sequence based on learned statistical patterns across billions of parameters.

Diffusion Model

A diffusion model is a type of generative AI that creates data by learning to reverse a gradual noise-adding process. During training, the model learns to progressively denoise random noise into coherent outputs like images, audio, or video. Diffusion models power tools like Stable Diffusion, DALL-E, and Midjourney, and have become the dominant architecture for high-quality image generation.

Chain of Thought

Chain of Thought (CoT) is a prompting technique that encourages an LLM to break down complex reasoning into intermediate steps before arriving at a final answer. By explicitly reasoning through each step, models achieve significantly better accuracy on math, logic, and multi-step problems. Extended thinking and "thinking" tokens in models like Claude represent a built-in form of chain-of-thought reasoning.

Model Context Protocol

Model Context Protocol (MCP) is an open standard developed by Anthropic that defines how AI applications connect to external data sources and tools. MCP provides a universal interface for LLMs to access databases, APIs, file systems, and other services through standardized server implementations. It enables building AI applications that can interact with the real world in a structured, secure way.

Token

In the context of AI language models, a token is the basic unit of text that a model processes — typically a word, subword, or character depending on the tokenizer. LLM pricing, context windows, and rate limits are all measured in tokens. Understanding tokenization is essential for optimizing costs and staying within model context limits when building AI-powered applications.

Fine-tuning

Fine-tuning is the process of further training a pre-trained AI model on a smaller, domain-specific dataset to adapt it for a particular task. Instead of training from scratch, fine-tuning adjusts existing model weights, which is significantly cheaper and faster. Common approaches include full fine-tuning, LoRA (Low-Rank Adaptation), and instruction tuning for aligning model behavior with specific requirements.

All Words
Matyas.

Web apps, mobile apps, AI automation. I help businesses save time and money with tech that actually works.

Links

  • Services
  • Projects
  • Experience
  • Blog
  • Dictionary
  • Contact

Coming Soon

  • Case StudiesSoon
  • Resources

© 2026 Matyas Prochazka. All rights reserved.