Back to Projects
Jass Agent using Monte Carlo Tree Search

Jass Agent using Monte Carlo Tree Search

A modular Jass-playing agent combining Determinized Monte Carlo Tree Search, neural trump prediction, and rule-based heuristics. The project evaluates algorithmic tradeoffs, MCTS convergence behavior, and distributed search under real-time constraints, culminating in a competitive tournament deployment.

Sep 2024 - Dec 2024 3 months

Tech Stack

Deep LearningReinforcement LearningPyTorch

As a requirement to attend the final exam for this module, we implemented a Jass-playing agent that competed against agents developed by other students. In addition to the implementation, we were required to submit a short technical report and a video explaining our approach and design choices. The agent was developed collaboratively together with Dan Livingston. All figures were originally only used for debugging. The source code can be found on GitHub.

Abstract

This project presents an exploration of artificial intelligence techniques for automating gameplay in Jass, a popular Swiss card game. We implemented bots with various strategies to evaluate performance in both choosing the trump suit and playing cards. Key contributions include the use of rule-based strategies, statistical models, and deep neural networks to optimize gameplay decisions.

Introduction

This work was created for the Deep Learning for Games course at the Lucerne University of Applied Sciences and Arts. We had to implement a bot that plays Jass, a traditional card game widely played in Switzerland.

Methods to Implement

In this simulation of Jass, the bots have to implement two methods. Both methods need to provide a response within 10 seconds or a random move will be played. choose_card, called when the bot has to choose a card to play and choose_trump, called when the bot has to choose the trump.

Information Provided to the Bot

The bot receives an observation of the game containing the information below

Design Choices

To explore various approaches, strategies were developed for both methods, enabling the testing of different combinations of TrumpStrategies and PlayStrategies. Additionally, PlayRuleStrategies were incorporated, which are evaluated in sequence before triggering a PlayStrategy.

TrumpStrategies

A TrumpStrategy processes an observation of a game to determine a trump suit, selecting from DIAMONDS, HEARTS, SPADES, CLUBS, OBE_ABE, UNE_UFE, or choosing PUSH, which allows the partner to decide the trump suit.

PlayStrategies

It processes an observation of a game and selects a card from the player’s hand to play.

PlayRuleStrategies

This is a list of rules that will trigger before a PlayStrategy is called. This is in order to play no-brainer moves. The strategy will go through the list, and if one is triggered, it will play the card. If no rule is triggered, the PlayStrategy will be called.

Implementations

TrumpStrategies

Five different TrumpStrategies were implemented:

Comparison of mean scores for different number of randomly played rounds
Figure 1: Comparison of mean scores for different numbers of randomly played rounds per trump for the same hand. Scores were aggregated using the mean and median. After approximately 20 simulated rounds, the scores converge and show no further change.

PlayStrategies

PlayRuleStrategies

Evaluation

The setup involves initializing an Arena. Upon execution, four bots are generated and evenly distributed into two teams. During the course of the games, their positions are adjusted by resetting the Arena halfway through, allowing for a balanced and comprehensive evaluation of performance. We fixed the seed for all experiments and executed them on the same hardware. The results table contain the Winrate and Average Points for both Overall (O) and for only Trump Rounds (T).

TrumpStrategy Evaluation

To evaluate the TrumpStrategies independent of the PlayStrategy, we ran every TrumpStrategy together with the RandomPlayStrategy and no PlayRuleStrategies. The opponent played with a random TrumpStrategy. We played 10’000 games in total.

StrategyWinrate OWinrate TAverage Points OAverage Points T
Random50.0050.0078.578.5
HighestSum59.1569.8885.807193.7382
HighestScore64.1178.4489.8663101.387
Statistical62.4375.2488.277798.3564
DeepNN65.2381.0890.6736103.2246

PlayStrategy Evaluation

Similar to the TrumpStrategies, we evaluated the PlayStrategies independent of the TrumpStrategy. We ran every PlayStrategy together with the best TrumpStrategy, DeepNNTrumpStrategy, for both teams. The amount of games we played varied due to computational limitations. We played 100 games for MCTS and DMCTS The time limit for choosing a card is 5 seconds, half of the time in the official tournament. For DMCTS we chose 31 determinations, due to hardware constraints.

StrategyWinrate OWinrate TAverage Points OAverage Points T
Random50.0080.7478.5102.9398
HighestValue46.1479.4075.6356104.0712
MCTS (d=1)d=1)62.0098.0088.9117.96
DMCTS64.0096.0090.61117.1

MCTS Iteration–Performance Tradeoff

Further experiments were conducted to identify an optimal number of MCTS iterations. No clear convergence point was observed. For a full hand of nine cards, the search did not converge even after 100 000 node expansions, which is far beyond the maximum number of iterations achievable within the tournament time limit. The identity of the best action continued to change, indicating that no single card consistently dominates. This suggests that the action space cannot be explored sufficiently within a reasonable time budget.

Comparison of mean scores and selected card index for different numbers of MCTS expansions with a full hand
Figure 2: Comparison of score (left) and current best card index (right) per iteration with a full hand of nine cards. The selected action does not converge to a single card, even after extensive tree expansion.

To test whether this behavior is caused by the large action space, the experiment was repeated with only three cards remaining in each hand. Even in this reduced setting, the argmax action continued to change until the search tree was fully expanded.

Comparison of mean scores and selected card index for different numbers of MCTS expansions with three cards
per hand
Figure 3: Comparison of score (left) and current best card index (right) per iteration with only three cards remaining per hand. The search still fails to converge to a single action before full tree expansion.

PlayRuleStrategy Evaluation

For the PlayRuleStrategies, we evaluated them together with DeepNNTrumpStrategy. For the PlayStrategies, we selected RandomPlayStrategy and DeterminizedMCTSPlayStrategy. The amount of games we played varied due to computational limitations. Note that MiniMaxPlayRule was limited to 5 seconds per move and we only ran 100 Games. The runs with DMCTS PlayStrategy were limited to 200 Games

Performance of PlayRuleStrategies for RandomPlayStrategy

StrategyWinrate OWinrate TAverage Points OAverage Points T
None50.0080.7478.5102.9398
OnlyValid50.0080.7478.5102.9398
Smear51.0881.5079.3933103.7926
TrumpJack50.5781.5479.2585104.3382
PullTrump49.2980.1879.0959104.5728
MiniMax51.0084.0078.46105.7
StrategyWinrate OWinrate TAverage Points OAverage Points T
None48.5081.0077.85103.62
OnlyValid52.5088.0078.745106.4
Smear50.5084.0078.575103.07
TrumpJack52.5083.0077.88103.16
PullTrump47.5074.0077.485100.78
MiniMax47.5079.0074.8296.63
All51.0083.0078.79103.18

Distributed DMCTS Node System

DMCTS is limited by compute power. Because of this, we decided to split the workload across multiple computing nodes. Communication is handled via HTTP. To ensure our response is within the time limit each worker has a separate timeout limit.

Tournament Bot

For the Tournament we combined the DeepNN TrumpStrategy and the DeterminizedMCTS PlayStrategy with the OnlyValid, SmearPlay and TrumpJack PlayRuleStrategies. The final tournament was executed on a distributed cluster consisting of eight devices, with CPU core counts ranging from 2 to 16, resulting in 62 total cores. Empirical experiments showed that the optimal number of determinizations per node is CPU_count − 1. This resulted in a total of 56 determinizations across the cluster.

Results

The tournament consisted of eight matchups, sampled randomly. Each matchup comprised twelve rounds. In every odd-numbered round, the game state was identical (hands and trump caller), while the teams were swapped. Scores were computed by summing all points won per match. The final ranking was obtained by summing scores across all matchups. This evaluation setup favors agents with weaker opponents, as matchups were not skill-balanced.

In the final tournament, we placed 6th out of 16 participants. Four agents were provided by the lecturers as reference points. Two of these placed 1st and 5th, respectively. The 1st-place agent had access to all players cards, giving it a significant advantage. The 5th-place agent used a similar approach to our agent, making it a more meaningful point of comparison. Our direct match against this agent resulted in loss 10601060 to 824824.

Interpretation

Scores overall were plausible. Significant gap in matchup against similar opponent suggest that there is some limitation to our agent. The scores are too close to suggest error with implementation of DMCTS. Our interpretation is that due to the overhead of communicating with 8 different devices the determinizations couldn’t run as many iterations as hoped.

View All Projects
Close