The technology of Polixir originated from building the Virtual-Taobao environment. Virtual-Taobao is the first simulator in the world that successfully virtualized a large-scale real-world scenario. In Virtual-Taobao, the agent is the recommender system that interacts with virtual buyers, in order to learn the best recommendation strategy.
On Value Discrepancy of Imitation Learning
Imitation learning trains a policy from expert demonstrations. This paper provides a deep understanding of various imitation learning approaches, showing different compounding errors from different ideas.
Improving Fictitious Play Reinforcement Learning with Expanding Models
Fictitious play is an effective framework for reinforcement learning in zero-sum games. Using deep neural networks as the policy models, the training faces issues of easy to forget old data and hard to mix-up models. This paper presents the expanding models to solve the issues.
Novelty-Prepared Few-Shot Classification
Few-shot classification targets at a high accuracy from only a few samples, which is crucial for real-world applications. Our new approach is the first one that the learning model is open-world aware. Consequently our model adapts better to new classification tasks, and achieves significant improvement from the state-of-the-art methods.