https://github.com/pytorch/pytorch/issues/89884. And it is built on top of Distributed Tensor which is proposed in: https ...
@article{WangExp2022, title={Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models} ...