Claude Code is currently one of the major coding AI harnesses, if not the major one. Many organizations choose to use Claude Code not through Anthropic plans but through hosted Anthropic models on third-party cloud providers. AWS Bedrock is an example of a service that provides a large catalog of various LLMs, including the latest Anthropic models such as Opus, Sonnet and Haiku.
Recent articles have found that Claude subscriptions are drastically more economically efficient (examples one, two), so why would anyone prefer to pay more for API billing? There are a few reasons:
claude.ai at 98.96% uptime and Claude Console at 99.45% uptime over the previous 90 days. In our experience, Bedrock is far more reliable. Its cross-Region inference can automatically route requests to another AWS Region and use compute across Regions to handle unplanned traffic bursts. This can be crucial for latency-sensitive applications.
When a Bedrock caller (AWS SDK, AWS CLI, HTTP) wants to use an LLM, it specifies one of the following:
arn:aws:bedrock:<source-region>:<account-id>:inference-profile/global.anthropic.claude-opus-4-8)arn:aws:bedrock:<region>:<account-id>:application-inference-profile/<application-profile-id>)Custom application inference profiles are used to "wrap" model use with a predefined set of tags. A typical CloudFormation resource looks like this (docs)
Resources:
ServiceAOpusProfile:
Type: AWS::Bedrock::ApplicationInferenceProfile
Properties:
Description: Service A uses Opus 4.8
InferenceProfileName: internal-service-a-opus
ModelSource:
CopyFrom: arn:aws:bedrock:<source-region>:<account-id>:inference-profile/global.anthropic.claude-opus-4-8
Tags:
- Key: owner-team
Value: team-x
- Key: environment
Value: production
- Key: service_name
Value: service-a
Here we specify the description, which model is used (Opus 4.8 in our example), and a list of tags. As soon as the profile is created, you specify its ARN in Claude Code or in the Bedrock invocation.
To be 100% sure that the caller can only use the inference profile to invoke the model, you need to set the permissions properly.
# Allow calling our inference profile
- Effect: Allow
Action:
- bedrock:InvokeModel
- bedrock:InvokeModelWithResponseStream
Resource:
- arn:aws:bedrock:<profile_region>:<account_id>:application-inference-profile/<profile_id>
# Allow calling the foundation models in all regions (note `*`)
# but only through our inference profile (note `Condition`)
- Effect: Allow
Action:
- bedrock:InvokeModel
- bedrock:InvokeModelWithResponseStream
Resource:
- arn:aws:bedrock:*::foundation-model/<model_id>
Condition:
StringLike:
bedrock:InferenceProfileArn: arn:aws:bedrock:<profile_region>:<account_id>:application-inference-profile/<profile_id>
# Allow fetching the inference profile and model metadata descriptions.
# Narrow down the 'Resource' if you need to.
- Effect: Allow
Action:
- bedrock:ListFoundationModels
- bedrock:GetFoundationModel
- bedrock:ListInferenceProfiles
- bedrock:GetInferenceProfile
Resource:
- '*'
This ensures that the caller must use an inference profile to call the model, and hence the tags will propagate properly.
Tip: you can use the following AWS CLI command to debug Bedrock permissions on a developer machine (it says hello to Opus 4.7):
aws bedrock-runtime converse \
--region us-east-1 \
--model-id global.anthropic.claude-opus-4-7 \
--messages '[{"role":"user","content":[{"text":"Say hello in one short sentence."}]}]' \
--query 'output.message.content[0].text' \
--output text
If your COO sees these Bedrock usage reports, they will have questions. So use application inference profiles for all workloads in your organization to make spending transparent. Before:

After:

Claude Code integration with AWS Bedrock supports custom application inference profiles:
export CLAUDE_CODE_USE_BEDROCK=1
export ANTHROPIC_DEFAULT_OPUS_MODEL="arn:aws:bedrock:<region>:<account_id>:application-inference-profile/..."
export ANTHROPIC_DEFAULT_SONNET_MODEL="arn:aws:bedrock:<region>:<account_id>:application-inference-profile/..."
export ANTHROPIC_DEFAULT_HAIKU_MODEL="arn:aws:bedrock:<region>:<account_id>:application-inference-profile/..."
However, it's very cumbersome to manage them in the age of rapidly evolving models because
How do you track spending then? You have the following options:
* Enable Bedrock invocation logging and track usage in real time
* Pros: near-real-time log delivery with accurate token counts and user identity
* Cons: This is highly sensitive data! You need to restrict access and implement the processing yourself, e.g. using AWS Lambda and a database.
For the purpose of usage tracking, I stripped out much of the information stored in the logs. It looks like this:
{
"timestamp": "2026-06-16T12:00:56Z",
"accountId": "...",
"region": "us-east-1",
"requestId": "...",
"operation": "InvokeModelWithResponseStream",
"modelId": "...",
"input": {
...
"inputTokenCount": 528,
"cacheReadInputTokenCount": 1058,
"cacheWriteInputTokenCount": 0
},
"output": {
...
"outputTokenCount": 259
},
"identity": {
"arn": "arn:aws:sts::0123456789:assumed-role/<role-name>/john.doe@yourorg.io"
},
"inferenceRegion": "eu-north-1",
"schemaType": "ModelInvocationLog",
}
See that you have cacheRead, cacheWrite, input and output token counts, and identity.arn identifies the user that made the request. The final cost can be computed if you multiply these figures by the pricing. This approach works if the user uses any AI harness, not only Claude Code, because they all call AWS Bedrock in the end.
In addition to that, the logs store the whole payload of the request, which means that you can track
and many more.
Using AWS Bedrock as a provider of Anthropic models for your organization doesn't come cheap, but it provides you with more control over spending, permissions, security and analytical capabilities.
Comments
0 published comments
No published comments yet.
Leave a comment
Comments are published after review.