[WIP] Tracking usage token from LLM¶

Overview¶

Based on the usage of LLM through the agents, workflows, and pipelines, we need to track the usage of tokens.

Algorithm¶

Algorithm: Retrieving Token Usage for Cost Accounting from ChatGPT API:

  Input:
    API key, `K`
    Chat model identifier, `model`
    User message sequence, `P`

  Output:
    Prompt tokens, `T_prompt`
    Completion tokens, `T_completion`
    Total tokens, `T_total`

1. Initialize: Obtain API key `K` from secure storage.
2. Set headers:
   - `Authorization ← "Bearer " + K`
   - `Content-Type ← "application/json"`
3. Construct request payload:
   - Set `payload.model ← model`
   - Set `payload.messages ← P`
4. Send request:
   - Perform HTTP `POST` request to the Chat Completions endpoint.
5. Receive response:
   - Store HTTP response as `resp`.
6. Validate response:
   - If `resp.status_code ≠ 200`, terminate with error.
7. Parse response body:
   - Decode JSON content from `resp`.
8. Extract token usage:
   - Set `T_prompt ← resp.usage.prompt_tokens`
   - Set `T_completion ← resp.usage.completion_tokens`
   - Set `T_total ← resp.usage.total_tokens`
9. Return:
   - Output `(T_prompt, T_completion, T_total)`.

Token Usage to Cost Mapping:

Let:

c_prompt(model) be the cost per prompt token
c_completion(model) be the cost per completion token

Then total request cost is:

Cost = T_prompt × c_prompt(model)
     + T_completion × c_completion(model)

Notes¶

Token usage is reported by the API, not estimated client-side
Usage values appear only after a successful completion
Streaming responses accumulate token usage incrementally
Cost calculation depends on model-specific pricing