Behavior trees are an excellent way to have an extensible language for describing AI. While understanding the basic structure of behavior trees is critical, there are several interesting technological challenges when implementing behavior trees in an actual game.
Full-disclosure: While some of this article may reflect some of the high-level systems used in Gigantic, I'm currently employed at a fancy fruit company and can neither confirm nor deny that any of this architecture is still in production at Motiga.
Behavior Tree Engine Challenges
Depending on the engine you’re working in, you’ll likely run into a couple of common implementation challenges. In particular, most engines I’ve worked with have at least some necessary event-driven code you’ll need to plug into, and reusing parts of behavior trees becomes critical as the logic becomes more complex.
Event handling is one of the simpler tasks for a behavior tree to manage. That said, we want to retain the procedural nature of the tree because managing event-driven structures simultaneously with procedural structures makes understanding the logic of the tree extremely complex. Instead, we can choose to handle events by polling a particular sentinel value to determine if an alternate branch of logic should be executed.
In practice, we want to capture events separately from the behavior tree structure, so that we don't miss events while executing tree logic that isn't actively checking for a particular event. One way of handling this is to create an individual event monitoring object per event we wish to monitor. Prior to each tick of the behavior tree engine, event monitoring objects can determine if an event has changed since the last frame, and can record this state. It's important at this point that the event monitoring object flags itself as having noticed a change. This allows the behavior tree to implement conditional nodes that can evaluate both if the event has been fired as well as if it has changed since the last time it was checked. This converts events being fired into states that can be polled, matching the natural structure of the behavior tree logic.
There are still some drawbacks to this mechanism, as it allows events to be ignored by the behavior tree if the logic is improperly created (such as not putting a conditional node in parallel with other logic if necessary). It also necessitates the creation of conditional nodes that check the state of an event as well as if the state itself has changed, potentially as unique nodes depending on how you want to ensure clarity. This reveals a fundamental truth of behavior trees - behavior trees are not a shortcut to making AI logic simpler, they merely improve the accessibility of that logic.
Of course, all this work in creating behavior trees would lack scalability and sustainability if we couldn't reuse common logic expressed in our behavior trees. This can be accomplished by supporting the concept of "subtrees". A subtree is basically another behavior tree, but where the root node passes the result of its child back to the tree that calls it (kind of like a subroutine). In order to connect a behavior tree with a subtree, a subtree proxy node is created, which has a reference to the subtree we want to execute. When that subtree proxy is ticked, it places the subtree root on the stack and continues execution. Finally, when the subtree's child nodes return a result, the subtree passes this result back to the proxy node that called it.
This allows us to take any set of logical instructions and actions and bundle them up in a subtree and reuse them. One of the difficulties that arises from this setup is that it becomes harder to keep the graphs acyclic. For each non-subtree, we now have to run a cycle test using a depth-first search for each subtree ensure that no cycles exist across subtree proxy nodes.
If you want to get really crazy, you can build subtrees with output nodes that expose output links on the proxy node. This allows you to encapsulate partial logic, much like calling a function and specifying delegates to execute. However, if you go down this path, the same proxy node can appear several times in your stack because subtrees with outputs that link to other subtrees can create structures that while not cyclical, are recurrent along a single execution branch. Thus, passing values back from a subtree to a proxy node will require tracking which stack to return to.
Behavior Tree Optimization Strategies
While there is often leeway for performance or efficiency when running AI on a client, every extra bit of memory or CPU time for AI run on a server has a continuing cost thanks to the nature of hosted environments. Basically, the more efficient the code, the cheaper it is to run a server for a game. Keeping that in mind, several architectural decisions were made in order to maximize the overall efficiency of the AI without severely impacting code maintenance.
Because the AI will often be doing very similar actions and routinely parsing shared logic, the first and perhaps most important efficiency gain is had by splitting the stateful data from the stateless data of the AI. That is, the logical structure of the AI is immutable, static, and stateless. An individual node, however, may want to capture some stateful data in order to maintain execution across ticks for any particular instance of an AI character. In order to achieve this separation of stateful from stateless data, each node class' member values are assumed stateless, and the node will instantiate stateful data per AI character to capture member values. This way, even as new AI characters proliferate, the amount of memory taken up by the AI tree structures will remain the same.
Beyond merely being strategically split, the data can also be organized using data-oriented engine design. Because we know that the order in which data will be accessed is generally depth-first, we can organize the both the structure of the tree and its stateful data by flatting the tree structure into an array ordered in a depth-first manner. This way, as we execute the tree, we are generally moving in a single direction across an array, and often accessing data contiguously. With stateful data generally being relatively tiny, and with proper alignment, we can produce a very cache-friendly execution system. Be aware though that the devil is in the details for any given implementation.
Another efficiency gain made possible from this flyweight-style split of stateful and stateless data for the behavior tree engine is that it allows for very simple lockless multithreaded access to the stateless portion of the data (the actual structure of the tree and the core logic of execution). With this property, we can maximize the parallelizable aspect of the behavior tree engine when applying Gustafson's Law. The stateless data is implicitly read-only and can have multiple readers with no contention, and the write portions are stateful and stored in separate memory locations and thus invoke no write contention.
Barring other engine or infrastructure considerations, this setup allows the behavior trees to share and execute their structure across multiple AI across different matches or game instances (ensuring that AI interactions are completely independent from each other). Note that this only affects the engine driving the behavior tree - designers are still blissfully unexposed to the complexities of parallelism.
What the FAQ?
How many behavior trees can I have using this system?
Lots! Each behavior tree needs only one instance of the tree structure itself and for each AI using it, an instance of stateful data. I personally recommend keeping the number of designed behavior trees to a minimum as it maximizes sharing of both memory and logic, making the trees much more maintainable.
Are behavior trees isomorphic to finite state machines?
Behavior trees can be made to be trivially turing complete, while a pure finite state machine is not because the memory you can store in an FSM is limited by the number of states you have.
Are behavior trees with an execution stack isomorphic to hierarchical finite state machines?
Hierarchical FSMs are merely mechanisms to organize an FSMs transitions, but you’re still limited to states and moving around between them, capturing no further information within the structure. A behavior tree is more like a procedural programming language, where the state is subsumed into the current execution pointer and the contents of the heap and execution stack.
What safe guards are there?
If two parallel branches both result in "run to target" actions that's bad. Does the second branch detect another run to target action is in progress causing it to fail? Do both actions execute causing a character to stand in place or ping pong? Does one action stall until the other completes? Can the designer choose?
There's likely not a single right answer in all cases. An important point here is that safe-guards are necessary in the editor to prevent trivially bad trees from being created. For example, the editor can prevent incompatible nodes from being run simultaneously, detecting cycles via subtrees, or any other pathological cases. For those that can’t be caught before execution (because they happen thanks to game state), the behavior tree engine can detect and report these errors. These are pretty rare though, as it’s pretty easy to find and fix all the issues in the behavior tree before hand (much like a compiler will catch most non-logic errors).
What kind of editor did you use?
Designers use a custom node editor written in C++ and wxWidgets (for compatibility with Unreal Engine 3) built to make organizing and debugging behavior trees as simple as possible. If you were to implement one yourself, having hooks into the game-engine to reflect current state is nice.
Are there only three types of nodes logic, condition, and action? What other node types are there?
There are plenty of kinds of nodes you can create that I haven't covered here. For example, I created the following additional node types:
- Structural nodes - perform basic branching logic,
- Evaluation nodes - like enumerators that use child nodes to evaluate each iterator value
- Declaration nodes - evaluators that also set one or more values after evaluation, if they have a valid evaluation
Decorator nodes - these have a single input and single output and do something with the status that flows in. For example, an inverter node that changes success to failure or a log node that prints out to the screen that this node ran with some value
In my experience, very few of these naming conventions are strict, so feel free to categorize nodes in a way that makes sense to you.
Do you need to be a programmer to work with behavior trees?
No! Not at all, you simply need to have a good grasp of logic. Behavior trees in Gigantic were created to allow technical designers to create the AI behaviors they wanted without having to write code. That said, the tool is only as good as the wielder.
What problem spaces do behavior trees solve really, really well?
AI and general scripting scenarios. They fundamentally let you create simple, reusuable logic using a visual programming language.
What problem spaces do behavior trees solve poorly?
The more complicated the logic you're trying to express, the less scalable visual programming languages generally are. One way to tackle this is to encapsulate more complex logic within nodes so as to keep behavior trees relatively simple. In this way, you create a sort of language hierarchy that allows you to understand and structure the logic of the game more easily.