What is a Token in AI?
Definition
A token is the basic unit of text that AI language models process. Tokens can be complete words, parts of words (subwords), or individual characters. For example, "chatgpt" might become ["chat", "gpt"] or ["token", "ization"]. Understanding tokens is crucial because AI pricing, context limits, and capabilities are often measured in tokens.
Token Examples
- "hello" = 1 token
- "hello world" = 2 tokens
- "AI" = 1 token (or 2 depending on tokenizer)
- "recursively" = might be ["re", "cursive", "ly"] = 3 tokens
- A paragraph of this text β 20-30 tokens
Why Tokens Matter
- Context Windows: Models have max token limits (GPT-4o: 128k tokens)
- Pricing: API costs are typically per 1M tokens
- Latency: Longer prompts = longer processing time
- Output Limits: Response length is part of token budget