Revisiting Suboptimal Search (Replicated), [Link]

Suboptimal search algorithms refer to a group of algorithms designed to find a solution faster, rather than the optimal solution. In certain practical applications, such as video games, it is often more important to find a solution quickly than to find the optimal solution. Despite their practical usefulness, the implementation of the previously presented algorithms in this area is often challenging due to their dependence on special heuristics or data structures. This paper addresses these challenges by presenting a new framework named Improved Optimistic Search, which simplifies the implementation process and improves the performance compared to previous algorithms. In addition to introducing this framework, we conduct several studies to evaluate different strategies used within the framework. In addition, one other contribution is evaluating different strategies within the framework to further optimize its performance.

A Survey on Learning Programmatic Policies, [Link]

Recent advancements in deep reinforcement learning (DRL) have shown significant progress in various domains. However, the generalization of neural network policies to new scenarios and the acquisition of complex skills through trial and error still pose significant challenges. Additionally, the limited interpretability of these policies to humans makes debugging difficult. To overcome these limitations, researchers are exploring new approaches to learn to synthesize programs that are both interpretable and generalizable. One such approach is the use of programmatic policies that use representations such as decision trees, state machines, and domain-specific programming languages. Several new approaches have been proposed in the direction of learning programmatic policies and this paper aims to review five studies in this area and discuss their contributions and limitations.

Empirical Study of Differential Q-learning on Continuing Control Tasks, [Link]

Differential Q-learning is a recently developed average reward algorithm for off policy control. We conduct an empirical study of differential Q-learning on two continuing tasks, Catch and Pendulum, in the linear function approximation setting. To the best of our knowledge, this is the first comparison between discounted and differential Q-learning in the literature. We provide performance results and sensitivity analyses. Our results show that differential Q-learning is no worse than discounted Q-learning on Pendulum but is not a suitable algorithm for Catch.

Level Blending with Evolutionary Algorithm in Latent Space of VQ-VAE, [Link]

In this work, (1) an exploratory study about the effec- tiveness of VQ-VAEs and their discrete latent space is con- ducted. Moreover, (2) an evolutionary search algorithm (to be specific Genetic Algorithm) in the latent space of VQ- VAE is used to help us inject controlled creativity into the blended maps using mutation and create new maps at a de- sirable level of blending by using a fitness function over tile distributions. We also, (3) present that changing the latent vector elements of one SMB map with elements of KI leads to interpolation between maps. Finally, (4) the combination of VQ-VAE with an evolutionary search algorithm (to be specific Genetic Algorithm) is compared with GA and VQ-VAE itself using two evaluation metrics, Expressive Range Analysis, and Quantify Span of Generation (QSG).