In the rapidly evolving field of natural language processing (NLP), the quality and diversity of training data are cruci