BitBully

One of the fastest and perfect-playing Connect-4 solvers around

Links
🔗 GitHub · PyPI · Docs

From Opening to Victory: The image shows three key stages of a Connect 4 match — an early board with initial placements, a mid-game filled with tension and strategy, and the final state where yellow wins by connecting four discs. Connect 4 is a two-player game where discs are dropped into columns, aiming to form a straight line of four. It blends tactical planning with defensive play, and despite its simplicity, it’s a classic example of solvable strategy games in AI.

BitBully is a high-performance, perfect-playing Connect-4 solver and analysis engine written in C++ with Python bindings. It’s designed for both developers and researchers who want to explore game-theoretic strategies or integrate a strong Connect-4 AI into their own projects.

🚀 Key Features

Blazing Fast Solving: Uses MTD(f) and null-window search algorithms.
Bitboard Engine: Board states are handled efficiently via low-level bitwise operations.
Advanced Heuristics: Threat detection, move ordering, and transposition tables.
Opening Databases: Covers all positions with up to 12 tokens, annotated with exact win/loss distances.
Cross-Platform: Compatible with Linux, Windows, and macOS.
Python API: Seamlessly integrates into Python projects via bitbully_core (powered by pybind11).
Open Source: Available under the AGPL-3.0 license.

📦 Installation

Install the latest stable release from PyPI:

pip install bitbully

No compilation needed—pre-built wheels included!

🧠 Example Usage (Python)

from bitbully import bitbully_core as bbc
import time

board = bbc.Board()
for _ in range(6):
    board.playMove(3)
print(board)

solver = bbc.BitBully()
start = time.time()
score = solver.mtdf(board, first_guess=0)
print(f"Solved in {round(time.time() - start, 2)}s → Score: {score}")

You can also solve boards defined as NumPy arrays, use opening books, and generate random game states. For more examples check out the docs or the GitHub repository·

📜 License

AGPL-3.0. View License

🙏 Acknowledgments

Inspired by the solvers of Pascal Pons and John Tromp.

Literature

The application of machine learning to board games remains an active and challenging research area, particularly due to the complexity and strategic depth of games like Chess, Go, and Connect Four. Unlike humans who can intuitively recognize patterns, artificial agents require structured learning approaches, often supported by carefully engineered features or representations.

A milestone in this field was Tesauro’s TD-Gammon, which demonstrated that self-play combined with temporal difference learning (TDL) could lead to expert-level performance in backgammon. Inspired by this success, many studies attempted to apply TDL to other board games, but the outcomes were often mixed due to higher complexity and lack of domain knowledge (Thill, 2012; Thill et al., 2012).

Prior work on Connect Four showed that learning strong strategies through self-play alone is feasible, but only with a very rich feature representation and a large number of training games. One such approach used N-tuple systems in combination with TDL to approximate value functions. These systems produced high-quality agents capable of defeating even perfect-play opponents, all without incorporating handcrafted game-theoretic knowledge. The success was largely attributed to the expressiveness of the N-tuple representation and extensive training with millions of games (Thill, 2012; Thill et al., 2012).

Subsequent research introduced eligibility traces—including standard, resetting, and replacing variants—into these systems. Eligibility traces enhanced temporal credit assignment and significantly accelerated learning (by a factor of two) while improving asymptotic playing strength (Thill, 2015).

To further improve training efficiency, recent studies investigated online-adaptable learning rate algorithms, such as Incremental Delta-Bar-Delta (IDBD) and Temporal Coherence Learning (TCL). A novel variant using geometric step-size adaptation outperformed conventional methods, reducing the number of required training games by up to 75% in some cases. The most effective algorithms proved to be those that combined geometric learning rates with nonlinear value functions and eligibility traces. These methods brought the total training requirement for learning Connect Four down to just over 100,000 games, a 13× improvement over earlier baselines (Bagheri et al., 2016).

Finally, preliminary experiments also applied this enhanced learning framework to other strategic games, such as Dots-and-Boxes, showing the framework’s potential for broader generalization, though some unique domain-specific challenges were identified (Thill et al., 2014).

Connect-4 Game Playing Framework (C4GPF)

The C4GPF is a Java-based framework for training, evaluating, and interacting with Connect Four agents. It features a GUI and supports various agent types, including:

Perfect-play Minimax agent with database and transposition table support.
Reinforcement Learning agent using n-tuple systems and TD-learning with eligibility traces.
Monte Carlo Tree Search (MCTS) agent.
RL-Minimax hybrid, combining tree search with learned state evaluations.

Key capabilities include:

Animated or step-by-step agent matches.
Benchmarking and head-to-head competitions.
Visualization and editing of n-tuple lookup tables.
Support for adaptive step-size algorithms (e.g., IDBD, TCL, AutoStep).

The framework is extensible and designed for research and teaching.

🔗 Repository: Connect-4 Game Playing Framework (C4GPF)

General Board Game Framework (GBG)

GBG is a flexible Java-based framework for general board game (GBG) learning and playing. Designed for research and education, it allows users to implement new board games or AI agents once and run them across all supported components.

Key features:

Supports 1-player, 2-player, and n-player board games.
Comes with a variety of built-in AI agents, including reinforcement learning and tree-based strategies.
Standardized interfaces and abstract classes make it easy to plug in new games or agents.
Enables fair competitions and benchmarking between agents across multiple games.
Suitable for both classroom use and research projects.

The framework includes documentation, a GUI, and a technical report explaining its architecture.

🔗 Repository: General Board Game Framework (GBG)

References

2016

IEEE Trans. Games

Online Adaptable Learning Rates for the Game Connect-4

Samineh Bagheri, Markus Thill, Patrick Koch, and 1 more author

IEEE Transactions on Computational Intelligence and AI in Games, 2016

Abs

Learning board games by self-play has a long tradition in computational intelligence for games. Based on Tesauro’s seminal success with TD-Gammon in 1994, many successful agents use temporal difference learning today. But in order to be successful with temporal difference learning on game tasks, often a careful selection of features and a large number of training games is necessary. Even for board games of moderate complexity like Connect-4, we found in previous work that a very rich initial feature set and several millions of game plays are required. In this work we investigate different approaches of online-adaptable learning rates like Incremental Delta Bar Delta (IDBD) or temporal coherence learning (TCL) whether they have the potential to speed up learning for such a complex task. We propose a new variant of TCL with geometric step size changes. We compare those algorithms with several other state-of-the-art learning rate adaptation algorithms and perform a case study on the sensitivity with respect to their meta parameters. We show that in this set of learning algorithms those with geometric step size changes outperform those other algorithms with constant step size changes. Algorithms with nonlinear output functions are slightly better than linear ones. Algorithms with geometric step size changes learn faster by a factor of 4 as compared to previously published results on the task Connect-4.

2015

Masters thesis

Temporal Difference Learning Methods with Automatic Step-Size Adaption for Strategic Board Games: Connect-4 and Dots-and-Boxes

Markus Thill

TH Köln – University of Applied Sciences, 2015

Master thesis, Festo award 2015

Abs HTML PDF

Machine learning tasks for board games which rely solely on self-play methods remain rather challenging up till today. The perhaps most impressive breakthrough in this field was achieved by Tesauro’s TD-Gammon, which was able to learn the game backgammon at expert level with a self-play variant of the temporal difference learning (TDL) algorithm. Since then, many studies attempted to replicate some of TD-Gammon’s success by applying TDL to other board games, however, mostly with mixed results. We found in our earlier work on the board game Connect-4 that a rich feature set is required to successfully learn a near-perfect strategy. Nonetheless, several millions of self-play training games were necessary in order to generate strong Connect-4 agents. In this thesis we will mainly focus on two topics, namely online-adaptable learning rate methods and eligibility traces, and investigate whether these approaches have the potential to speed up learning. For the Connect-4 learning task we show that algorithms with geometric step-size changes have the best performance, in some cases reducing the required number of training games to learn the game by more than 40%. In a case study, we compare several state-of-the-art step-size adaptation algorithms with respect to their sensitivity towards certain meta parameters. In the further course of this thesis, we investigate the benefits of different eligibility trace variants. Additionally, we extend several learning rate algorithms to eligibility traces and examine their performance. We could observe that eligibility traces improve the speed of learning by a factor of two for our Connect-4 task. Overall, with several additional enhancements, we could reduce the number of training games to learn Connect-4 to slightly more than 100 000, which is an improvement by a factor of 13, compared to previously published results. In the last sections of this work, we apply the learning framework that we developed for Connect-4 – with several adjustments – to the strategic board game Dots-and-Boxes and discuss the main problems that we observed for our initial experiments.

2014

CIG

Temporal Difference Learning with Eligibility Traces for the Game Connect-4

Markus Thill, Samineh Bagheri, Patrick Koch, and 1 more author

In CIG’2014, International Conference on Computational Intelligence in Games, Dortmund, 2014

Abs

Systems that learn to play board games are often trained by self-play on the basis of temporal difference (TD) learning. Successful examples include Tesauro’s well known TD-Gammon and Lucas’ Othello agent. For other board games of moderate complexity like Connect Four, we found in previous work that a successful system requires a very rich initial feature set with more than half a million of weights and several millions of training games. In this work we study the benefits of eligibility traces added to this system. To the best of our knowledge, eligibility traces have not been used before for such a large system. Different versions of eligibility traces (standard, resetting, and replacing traces) are compared. We show that eligibility traces speed up the learning by a factor of two and that they increase the asymptotic playing strength.

2012

Bachelor thesis

Reinforcement Learning mit N-Tupel-Systemen für Vier Gewinnt

Markus Thill

TH Köln – University of Applied Sciences, 2012

Bachelor thesis, 1st prize in Opitz award 2013, Festo award 2012, Ferchau award 2012

Abs HTML PDF

Die Untersuchung maschineller Lernverfahren für Brettspiele stellt auch heute noch ein sehr interessantes Forschungsgebiet dar. Dies liegt vor allem daran, dass das Erlernen komplexer Spiele wie dem Schach- oder Go-Spiel nach wie vor als sehr anspruchsvoll gilt. Während Menschen in der Lage sind, gewisse Zusammenhänge bzw. Gesetzmäßigkeiten in Spielen zu erkennen und daraus die richtigen Rückschlüsse zu ziehen, ist dies für ein Computerprogramm deutlich schwieriger. Aus diesem Grund müssen die Entwickler häufig viel spieltheoretisches Wissen in das Programm einbringen, damit der Lernprozess überhaupt fähig ist, auf die besonderen spielspezifischen Merkmale zu achten. In dieser Arbeit wird die Anwendung von sogenannten N-Tupel-Systemen – in Kombination mit einer Reinforcement-Learning-Trainingsumgebung – auf das Spiel Vier Gewinnt untersucht. N-Tupel-Systeme dienen dazu, lineare Nutzenfunktionen von Agenten zu approximieren, sodass Stellungsbewertungen vorgenommen werden können. Um diese Funktionen zu erlernen, werden die N-Tupel-Systeme mithilfe des Temporal Difference Learnings (TDL), einem Algorithmus zur Lösung von RL- Problemen, trainiert. Das Training der Agenten erfolgte ausschließlich durch Self-Play, während des Trainings kam daher kein Lehrer oder spieltheoretisches Wissen irgendeiner Form zum Einsatz. Dennoch gelang es, Agenten mit hoher Spielstärke zu trainieren, die in vielen Fällen einen perfekten Spieler schlagen konnten. Insbesondere die N-Tupel-Systeme, die eine sehr große Zahl an Features generieren und die passenden selektieren, tragen zu den außerordentlich guten Ergebnissen bei.
PPSN

Reinforcement learning with n-tuples on the game Connect-4

Markus Thill, Patrick Koch, and Wolfgang Konen

In PPSN’2012: 12th International Conference on Parallel Problem Solving From Nature, Taormina, 2012

Abs

Learning complex game functions is still a difficult task. We apply temporal difference learning (TDL), a well-known variant of the reinforcement learning approach, in combination with n-tuple networks to the game Connect-4. Our agent is trained just by self-play. It is able, for the first time, to consistently beat the optimal-playing Minimax agent (in game situations where a win is possible). The n-tuple network induces a mighty feature space: It is not necessary to design certain features, but the agent learns to select the right ones. We believe that the n-tuple network is an important ingredient for the overall success and identify several aspects that are relevant for achieving high-quality results. The architecture is sufficiently general to be applied to similar reinforcement learning tasks as well.