What is a Token in AI?

NLP & Language 4 min read

Definition

A token is the basic unit of text that AI language models process. Tokens can be complete words, parts of words (subwords), or individual characters. For example, "chatgpt" might become ["chat", "gpt"] or ["token", "ization"]. Understanding tokens is crucial because AI pricing, context limits, and capabilities are often measured in tokens.

Token Examples

  • "hello" = 1 token
  • "hello world" = 2 tokens
  • "AI" = 1 token (or 2 depending on tokenizer)
  • "recursively" = might be ["re", "cursive", "ly"] = 3 tokens
  • A paragraph of this text β‰ˆ 20-30 tokens

Why Tokens Matter

  • Context Windows: Models have max token limits (GPT-4o: 128k tokens)
  • Pricing: API costs are typically per 1M tokens
  • Latency: Longer prompts = longer processing time
  • Output Limits: Response length is part of token budget