A Friendly Guide to the AI Landscape for Engineers

Featured photo of some books and alphabet blocks on a table — Image source: Unsplash

In a world where apps can chat, compose images and even write code, Artificial Intelligence (AI) often feels like magic. Yet behind the headlines and buzzwords, AI is simply about getting computers to perform tasks that normally require human intelligence, such as understanding language, recognising images, or making decisions. It's sometimes funny to think how we've trained AI in our likeness, which has also resulted in it reflecting human biases. Many of us have used voice assistants (Siri, Alexa) or tried AI chatbots (like ChatGPT) and wondered what AI is under the hood. I know I have, and I'm sure many of you reading this article might feel the same. This guide will walk you through the AI universe step by step, covering basics and real-world tools along the way. Join me as I learn about AI and follow my Medium profile for more articles on this topic.

What is AI? The big picture

At its core, AI refers to techniques that let machines mimic human intelligence — learning from data, making decisions, solving problems or understanding language. In practice today, nearly all AI is narrow AI: systems built for one task. For example, your phone's face-unlock feature or a chess engine is a type of narrow AI. IBM explains that narrow AI (sometimes called weak AI) is specialised: it can identify faces in photos or parse speech to text, but it can't do everything. For example, a vision-based AI can't understand voice commands and vice versa.

In short, AI is an umbrella term with many subfields. Think of it as a layered cake: at the top, we have Artificial Intelligence (all systems simulating intelligence). Inside that, Machine Learning (ML) is a layer where we let computers learn from data. Deep inside ML is Deep Learning, which uses multi-layer neural networks. Each layer of a neural network processes data and passes information to the next, mimicking (very roughly) how neurons work in a brain. Today's most impressive AI breakthroughs generally come from these machine-learning approaches.

AI is already all around us. Chatbots that feel human-like, photo apps that “dream” up new images, recommendation engines on Netflix, and self-driving car cameras — these all use AI techniques. The common thread is that these systems process data (text, pixels, audio, etc.) and make predictions or take actions in an automated way. For instance, Amazon's recommendation system uses machine learning to predict products you might like based on your history. All of this is AI moving from the lab into everyday life.

How machines learn: Machine Learning in a nutshell

To be useful, an AI system needs a way to learn from examples. That's where Machine Learning (ML) comes in. In ML, we give the computer lots of data, and it finds patterns or rules in it. IBM puts it simply: “Machine learning is a subset of AI that allows for optimisation; it helps you make predictions that minimise errors from guessing.” In practice, we set up an ML model, feed it data (and often the correct answers), and it figures out how to make accurate predictions or decisions on new data.

There are three main paradigms of ML:

Supervised learning: The model is trained on labelled data — that means each example comes with the correct answer. For instance, an email dataset labelled “spam” or “not spam” lets an algorithm learn to classify new emails. The model essentially learns a mapping from inputs to outputs. In other words, “given this input (an email's text), the answer should be this label.” Supervised learning is everywhere: image classifiers, language translators, and financial forecasting. Whenever we have past data paired with correct answers, we can apply supervised learning.
Unsupervised learning: Here, the data has no labels. The algorithm tries to find structure or patterns on its own. For example, giving a model thousands of customer records might let it cluster customers into groups with similar buying habits. Or an unsupervised model might compress images into simpler representations. This is useful for tasks like grouping, anomaly detection or pre-training models before fine-tuning. Essentially, unsupervised learning is about exploring data to discover hidden patterns without explicit answers.
Reinforcement learning (RL): Think of a game or a robot training itself. An agent interacts with an environment and learns to take actions that maximise a reward signal. For example, a chess engine learns by playing games (getting +1 for a win, –1 for a loss) and gradually improves its strategy. DeepMind's AlphaGo famously mastered the complex Chinese game Go and defeated a human champion years before anybody thought it was possible to do so. In robotics or recommendation engines, an AI might get rewards for completing tasks or for keeping users engaged. In reinforcement learning, the model figures out policies (action rules) through trial and error and long-term feedback.

Each of these learning methods suits different problems. Supervised learning excels when you have plenty of labelled examples. Unsupervised is useful when labelling is hard or when you want the model to discover structure (often used to pre-train language models). Reinforcement learning is key for dynamic decision-making problems (like games, robotics or autonomous driving).

Neural networks and deep learning

Many modern ML breakthroughs rely on neural networks, which are algorithms vaguely inspired by the brain's neurons. A neural network has layers of interconnected “nodes” (neurons). Each neuron applies a weight to its inputs, sums them, and passes them through an activation function. What makes neural networks powerful is that with enough layers and data, they can learn to approximate very complex functions.

Deep learning is simply using neural networks with many layers (hence “deep”). More layers allow the network to learn multiple levels of abstraction. For example, in image processing, the first layers might detect edges, middle layers combine edges into shapes, and top layers recognise objects (a cat, a building, etc.). IBM notes that deep learning automates feature extraction and can use very large datasets. This is why deep learning became popular: engineers no longer need to handcraft image features by hand.

In real-world terms, deep neural networks are behind voice recognition (our phones understanding speech), image recognition (tagging photos), language translation, and much more. Training these networks can require huge amounts of data and computing power (often GPUs or specialised hardware). But the results are impressive: a well-trained deep network can classify images or understand speech far faster and often more accurately than older methods.

In pursuit of mimicking human intelligence. Image source: Unsplash

The rise of LLMs and generative AI

Over the past few years, a new wave of AI has emerged: large language models (LLMs) and generative AI. These are giant neural networks pre-trained on vast text (and sometimes code or images) that can generate content. The most famous examples are OpenAI's GPT series (GPT-3, GPT-4, ChatGPT) and similar models from other companies (Google's Gemini, Meta's LLaMA, etc.).

An LLM is essentially a transformer-based network trained on a large corpus of text. AWS describes them as “very large deep learning models that are pre-trained on vast amounts of data.” The transformer architecture — introduced in 2017 — lets the model pay attention to relationships between all words in a sequence, enabling it to process text in parallel and capture context. This architecture and scale allow LLMs to learn complex language patterns. AWS notes that transformer models learn grammar, facts and reasoning just by processing text.

Why do these huge models matter? Because they're surprisingly flexible. A single LLM can be fine-tuned or given instructions to perform many tasks: answering questions, summarising documents, translating languages, writing code and more. AWS points out that “one model can perform completely different tasks such as answering questions, summarising documents, translating languages and completing sentences.” This versatility comes from learning on so much data that the model effectively “knows” language broadly.

GPTs are prime examples. “GPT” stands for Generative Pre-trained Transformer. IBM explains that GPT models are a family of large language models that power ChatGPT and other generative AI apps. The first GPT (GPT-1) came out in 2018. After that came GPT-2, GPT-3, and most recently GPT-4 in 2023. In 2024, OpenAI released an even more advanced variant, GPT-4o, which handles audio, images and text in real time. The key is that these models are pre-trained on a general task (predicting text) and then can be adapted to many uses.

What can these LLMs do? Quite a lot. ChatGPT, launched by OpenAI in November 2022, showed how fluent these systems can be. Built on GPT-3.5 (and later GPT-4), ChatGPT was trained on a huge text dataset and then refined with reinforcement learning to follow user instructions. Remarkably, ChatGPT gained over 1 million users in just five days, and 100 million by January 2023 — one of the fastest growth curves for any application. People use it to draft emails, brainstorm ideas, get coding help or just have a conversation.

LLMs have inspired a whole class of foundation models. A foundation model is any giant AI model trained on broad data to solve many tasks. Language models like GPT and Google's BERT are classic examples. But the idea extends beyond text. OpenAI's DALL·E 2 is a model that generates images from text prompts. It's literally an AI system that can create realistic images and art from a description in natural language. Given “a unicorn in a party hat riding a hoverboard,” DALL·E will paint that picture. Other models like Midjourney or Stable Diffusion do similar text-to-image creation. There are also multimodal models (like GPT-4o) that take images and text as input.

The applications of LLMs and generative models are broad. According to IBM, transformer-based models like GPT are used everywhere from chatbots to coding assistants. For instance, GPT-powered systems provide more human-like responses in chatbots and voice assistants. Generating marketing copy, summarising legal documents, and translating languages are all in their wheelhouse. They can even write code: AWS's overview notes that models like OpenAI's Codex (used in GitHub Copilot) or Amazon's CodeWhisperer can generate useful code snippets from plain English requests. Essentially, instead of hand-coding every rule, we build one big model and prompt it for different tasks.

Agentic AI: Autonomous agents and planning

Beyond generating content, a new agentic AI paradigm is emerging: AI systems that act as autonomous agents capable of planning and executing multi-step tasks. In this context, an agentic AI system can make its own decisions and adapt its actions to achieve goals. Nvidia explains that agentic AI “uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems.” In other words, an agentic AI can break down high-level objectives into sub-tasks, coordinate specialised models, and carry out actions in the real world. TechTarget similarly defines these agents as systems capable of “independent decision-making and autonomous behaviour,” contrasting them with fixed-rule automation.

This concept has been enabled by LLMs: experts note that recent advances in large language and generative models have accelerated some of the characteristics agentic AI needs. For example, large models provide the reasoning and knowledge needed for an AI agent to formulate plans and understand context. Agentic AI is seen as the next leap in human–AI collaboration. As one Harvard Business Review summary put it, the agentic era could feature AIs that “plan your next trip overseas and make all the travel arrangements” or act as virtual caregivers or supply-chain managers that “optimise inventories on the fly.” Microsoft CEO Satya Nadella has made a bold prediction that once agentic AI systems mature, they could replace SaaS entirely!

Nvidia outlines a typical four-step loop for agentic problem-solving:

Perceive: The AI gathers and processes data from various sources (sensors, databases, the web, etc.), extracting relevant information from the environment.
Reason: A large language model (the AI's reasoning engine) interprets the task and devises a plan, coordinating any specialised sub-models as needed. It might, for instance, use retrieval-augmented generation (RAG) to fetch information from external knowledge bases to inform its plan.
Act: The agent executes the plan by invoking external tools, software or actions via APIs. For example, an AI customer-service agent might place an order in a billing system or navigate a user interface on the agent's behalf. Systems can include safety guards (e.g. approval thresholds) to ensure actions remain appropriate.
Learn: The agent continuously improves by feeding back the results of its actions. Data from each interaction is used to update the model (a “data flywheel”), so the AI becomes more accurate and efficient over time.

These autonomous agents are already finding real-world use cases. In supply-chain management, for example, agents can orchestrate entire workflows: one commentator describes how, if a drought disrupts supplies, an AI agent could automatically find alternative sources, reroute shipments, and adjust orders without human intervention. In customer service, agentic bots are deployed in call centres: they can simultaneously analyse customer sentiment, retrieve order histories and policy documents, and respond to customers, effectively automating routine inquiries. If the AI cannot resolve an issue, it can triage and pass the case to a human with full details. In every case, agentic AIs are envisioned as assistants that handle the mechanical legwork so that people can focus on higher-level tasks.

Model Context Protocol (MCP): Connecting AI to Data

A key challenge for agentic systems and LLMs alike is accessing up-to-date data. The Model Context Protocol (MCP) is a recent open standard designed to solve this. In late 2024, Anthropic announced that it would open-source MCP as “a new standard for connecting AI assistants to the systems where data lives.” MCP provides a universal, open way for AI applications to query external data and tools, replacing the need for many custom integrations.

Technically, MCP works by defining two roles: MCP servers (which expose a data source) and MCP clients (AI applications that connect to those servers). For example, a company could run a local MCP server that provides access to its email, documents or CRM database. Then any MCP-compatible AI (the client) can securely retrieve that context as needed. Anthropic provides pre-built MCP server connectors for common platforms like Google Drive, Slack, GitHub and Postgres, enabling AIs to pull relevant files or messages in real-time. The standardisation means developers no longer write separate adapters for each system — an AI agent can simply query the MCP server using a common API.

Early adopters see MCP as key infrastructure for smarter AI agents. Companies like Block and Apollo have already integrated MCP into their workflows. As Block's CTO remarked, open protocols like MCP are “the bridges that connect AI to real-world applications,” enabling agentic systems that handle routine work so people can focus on creative tasks. In short, MCP is laying the groundwork for AI tools to pull in corporate data (and keep context) across tasks, making agents more reliable and context-aware.

AI in the real world: Tools and examples

By now, you've seen that AI is behind a lot of modern tech. Here are some familiar categories and tools:

Chatbots & assistants: Apps like OpenAI's ChatGPT, Google's Gemini and Microsoft's Copilot Chat let you ask questions or prompts in natural language and get human-like replies. Consumer voice assistants (Siri, Alexa, Google Assistant) also fall here: they use NLP models to understand voice commands and act on them. IBM notes that services like Siri and Alexa are classic examples of narrow AI. Essentially, these interfaces sit on top of large language models that handle the conversation.
Coding helpers: A big development is AI for software development. Microsoft's GitHub Copilot autocompletes code or writes functions based on comments you write. It's powered by OpenAI's Codex model. Amazon CodeWhisperer (now Amazon Q Developer) does similar for AWS users. There are also AI code reviewers and documentation generators. AWS explains that models like Codex can generate code in Python, JavaScript, Ruby and more just from English descriptions. As a result, engineers can get boilerplate code or examples without writing every line manually.
Image and video generation: Tools like DALL·E 2, Midjourney and Stable Diffusion let you create images by typing text. Need a concept art of “a futuristic city at sunset in watercolour style”? The AI paints it. Many creators and designers use these to prototype ideas or produce graphics. Adobe Firefly introduces AI capabilities into the Adobe ecosystem that allow users to generatively fill images, correct images using AI, and even remove backgrounds. Popular image editing software, Canva, also introduced Magic AI features, which introduce AI capabilities into their suite. Some video generation/editing tools are emerging too, for example, apps that turn text into short clips.
Voice and audio: Beyond assistants, some tools generate or edit audio. For instance, OpenAI's Sora can generate music, and ElevenLabs creates realistic speech from text. Even routine audio tasks (like removing background noise or transcribing speech) now use AI under the hood.
Data and analytics: Many backend services use AI. Netflix's recommendations, Spotify's song picks, or YouTube's suggested videos all use ML to predict what you'll like. In business, AI tools can analyse large datasets: spotting fraud, forecasting sales, detecting anomalies, or even generating plots from data. IBM notes that LLMs can even analyse data tables and describe trends via natural language APIs.
Health & Science: AI is also making waves in medicine and research. Deep learning can analyse medical images (X-rays, MRIs) to detect issues, and models like Google DeepMind's AlphaFold predict 3D protein structures, accelerating drug discovery. These applications often combine AI with domain knowledge to tackle complex scientific problems.
Robotics & autonomy: Self-driving cars (Tesla, Waymo) use deep learning for vision (detecting lanes, pedestrians) and reinforcement learning for driving decisions. Factories use AI-driven robots to sort items or manage inventory. Even household robots (like Roomba) use simple AI to navigate. These systems combine sensors (cameras, lidar) with AI models to perceive their environment and decide actions.
Agentic AI: New task-oriented agents are also appearing. For instance, some AI assistants can autonomously perform complex workflows. Popular no-code app builder, Glide apps, has introduced AI agents into their suite of solutions, and it's already looking pretty impressive.

This isn't an exhaustive list, but it shows how AI is integrated into modern tools. Today, “AI as a feature” is common. In the latest survey, 78 per cent of respondents say their organisations use AI in at least one business function. As engineers, even if we don't build LLMs from scratch, we'll often use APIs and services that are built on them.

Using the right tool for the job. Image source: Unsplash

Recap and what's next

Let's sum up what we've covered about the AI landscape:

AI is broad: It means any computer system doing tasks that usually need intelligence. Today's AI is mostly narrow, i.e. doing specific jobs. Human-level intelligence (AGI or artificial general intelligence) does not yet exist.
Machine learning is the engine: ML (especially deep learning) is how modern AI systems learn from data. Supervised learning uses labelled examples, unsupervised learning finds hidden patterns, and reinforcement learning learns by feedback.
Neural networks: Deep networks with many layers learn features automatically from raw data. They power vision, speech recognition, translation, and more by processing images, audio and text.
Large language models & generative AI: These models (like GPT, BERT, DALL·E) are trained on huge datasets and can generate text, images, or other media. LLMs are highly adaptable — one model can do many tasks.
Real-world tools: AI is everywhere now — chatbots, creative tools, coding assistants, recommendation systems, medical diagnostics, autonomous cars, agentic AI, and beyond. These often use ML/LLMs under the hood, even if we don't see it. Learning the basics (neural nets, learning methods, key terms) helps us understand and use these tools.

I hope you've soaked in as much as you can about AI from this article. I'm learning quite a bit along with you, and I hope to continue learning and writing about AI. AI might be buzzword central, but at its heart, it's built on solid concepts of data and algorithms — things we engineers can grasp.

That's it! Thanks for reading.