Simulating AWS IAM with Prolog

Prototyping a better developer tool for cloud security

September 23, 2024

Of all the AWS services, IAM seems to be the least well-understood. There is something about its data types, logical rules and runtime conditions that make for an arcane, abstract service that most developers treat like a black box.

And that’s unfortunate because IAM secures organizations’ cloud resources, and can be the difference between having a data moat and a data breach.

Now whilst there are a bunch of tools¹ to help manage IAM permissions, they all suffer from one drawback or another. So I wanted to prototype something better.

Sketching a solution

What I’d like is a local program that can ingest IAM policies and then simulate whether certain requests would succeed or fail. The program should be able to enumerate all the actions an identity may perform on a given resource. I could then write unit tests which check my app has the permissions it needs and that it does not grant permissions in violation of organization-wide rules. This would work with CI/CD to prevent new security holes, and identify missing permissions before the app is deployed.

The program should also be able to generate IAM policies to fix permission issues. For instance, if I want to grant my app access to a DynamoDB table, or restrict write access to an S3 object to only my app.

This requires three capabilities:

Define all relevant IAM data (users, roles, groups, policies, attachments). If we make the simplifying assumption that all IAM data is managed by an IaC tool like Terraform, then this is already solved.
Assemble the relevant IAM policies for a principal. Let’s just assume we can do this for now.
Compile the policies from step 2 into a data structure with some useful functions like has_perms, list_perms, explain_perms and fix_perms. This we’ll build in Prolog.

Prolog for simulation

Prolog might be the ideal simulator language. Prolog has “facts”, which are named n-ary tuples that are saved in an indexed database table-like structure. To model simple IAM states, we need a couple of different facts:

policy(actionString, resourceString). This is the set of relevant allow policies for the principal e.g. (s3:GetObject, s3://example.com/public/foo.csv).
action(actionString). This is the set of all possible IAM policy actions, e.g. s3:GetObject, s3:PutObject …

Instead of functions, Prolog “rules” are logical expressions that describe relations between facts. Here’s a simple rule can/2, to check whether a principal can perform an action on a resource:

can(Action, Resource) :-    % can/2 is true IF
  action(Action),           % the action exists AND
  policy(Action, Resource). % the policy exists

Although it looks like a function, rules behave more like queries, with the variables Action and Resource acting as placeholders. To see it in action, I’m going to create a few facts:

action('s3:GetObject').
action('s3:PutObject').
policy('s3:GetObject', 's3://example.com/foo.csv').

And now the rule can be queried:

?- can('s3:GetObject', 's3://example.com/foo.csv').
true.
?- can('s3:PutObject', 's3://example.com/foo.csv').
false.

So far so good. But I don’t have to supply Action and Resource, Prolog can find them for me:

?- can(A,R).
A = 's3:GetObject',
R = 's3://example.com/foo.csv'.

Prolog uses a technique called unification to find a solution to the query.

Let’s add another policy to make this more interesting:

policy('s3:PutObject', 's3://example.com/foo.csv').

Now we can query to find all actions we can take on a particular resource:

?- can(A, 's3://example.com/foo.csv').
A = 's3:GetObject' ;
A = 's3:PutObject'.

Each successful match behaves like an iteration of a loop, with A set to the value of the matching fact. Prolog uses a backtracking algorithm to find every solution to our query².

Now consider, does our principal have permission to delete the resource?

?- can('s3:DeleteObject', 's3://example.com/foo.csv').
false.

Prolog answers “false” because there is no fact action('s3:DeleteObject'). But if there is no fact, shouldn’t the answer be “unknown”?

Prolog operates under the “Closed World” assumption which states that it has all the relevant facts in its database. If it has all the facts, when a query fails, it must be false. This is known as “negation as failure”.

IAM behaves in a similar way - by default, permission requests fail. And apart from resource policies, IAM doesn’t check that a particular resource exists, it simply permits the request to proceed assuming the context allows it.

So in order to answer queries about S3 permissions, our IAM simulator should have every possible S3 action in its database. This is also why step 2 above is required to assemble every relevant policy for the principal. Assuming we have every relevant policy and action, we can simulate IAM permissions (just like IAM, we don’t need every resource).

The IAM Simulator Project

IAM’s policy evaluation logic is more complicated than our simple simulator can handle. Policies can allow or deny permissions, resource names can contain wildcards, there are boundary policies which act like inclusion lists, and so on.

I started the IAM Simulator Project to accurately model IAM behavior. Identity and boundary policies and wildcards are already supported.

One reason I think that Prolog is such a good tool for simulation is the core module is only 160 lines of code. But that’s not all: we can use Prolog’s unification to generate policies. Behold:

$ swipl iam/sim.pl iam/s3.pl
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
...
?- fix('s3:GetObject', 's3://example.com/public/*', Ch).
Ch = [changelog(add, policy(identity, 's3:GetObject-s3://example.com/public/*-allow', allow, 's3:GetObject', 's3://example.com/public/*'))] .

If you’re interested in contributing to this project, please open an issue or feel free to pickup an existing one.

Notes

IAM Tools:

The IAM SimulateCustomPolicy API is most similar to the prolog simulator described above. It’s slower because each request is an authenticated network request but supports more simulation features, like policy conditions. OTOH it cannot generate policies.
The IAM Access Analyzer can validate policies, identify shared external resources, and unused permissions. But the service does not prevent security holes nor help the developer trying to get their web app to talk to their database.
Localstack simulates a local AWS instance and can enforce IAM permissions across AWS services, but because it doesn’t understand IAM, it cannot identify security holes or grant missing permissions.
Startup Pulumi has a terraform-like tool which can generate the required IAM policies for resources referenced by code. This empirical approach does solve common app permission issues; but otherwise suffers from the same limitations of understanding as other tools.

This is the default behavior. Prolog has a “cut” operator to stop backtracking.

Tags: aws-iam prolog simulation cloud-security

Code