The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that encompasses 21 sub-datasets in English, Japanese, and Chinese.
In this paper, we also propose a method for dataset translation assisted by Large Language Models (LLMs), which significantly reduces the manual annotation time required for dataset construction by leveraging LLMs to translate the original Japanese datasets.
Additionally, we have enriched the dataset by incorporating open-domain Named Entity Recognition (NER) and sentence classification tasks. Utilizing this expanded dataset, we developed a unified input-output framework to train an Open-domain Information Extraction Large Language Model (OIELLM). The OIELLM model demonstrates the capability to effectively process novel MMM datasets, exhibiting significant improvements in performance.
Let me conclude by thanking the contributors to the MMM dataset for contributing the fundamental dataset. And the pioneering researchers who selflessly contributed.
1. Japanese Wikipedia NER dataset - Takahiro Omi - https://github.com/stockmarkteam/ner-wikipedia-dataset
2. JGLUE: Japanese General Language Understanding Evaluation - Kentaro Kurihara, Daisuke Kawahara, Tomohide Shibata - https://github.com/yahoojapan/JGLUE?tab=readme-ov-file
3. livedoor news corpus - 関口宏司 - https://www.rondhuit.com/download.html
4. UniversalNER - Wenxuan Zhou - https://arxiv.org/abs/2308.03279
Here is also some previous work on the MRE series. You may be able to get more information about MRE from these works.
1. Mutual Reinforcement Effects in Japanese Sentence Classification and Named Entity Recognition Tasks (2023) Chengguang Gan, Qinghao Zhang, Tatsunori Mori
2. USA: Universal Sentiment Analysis Model & Construction of Japanese Sentiment Text Classification and Part of Speech Dataset (2023) Chengguang Gan, Qinghao Zhang, Tatsunori Mori
3. GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effect (2023) Chengguang Gan, Qinghao Zhang, Tatsunori Mori
@misc{gan2024mmmmultilingualmutualreinforcement, title={MMM: Multilingual Mutual Reinforcement Effect Mix
Datasets & Test with Open-domain Information Extraction Large Language Models}, author={Chengguang Gan and Qingyu
Yin and Xinyang He and Hanjun Wei and Yunhao Liang and Younghun Lim and Shijian Wang and Hexiang Huang and
Qinghao Zhang and Shiwen Ni and Tatsunori Mori}, year={2024}, eprint={2407.10953}, archivePrefix={arXiv},
primaryClass={cs.CL}, url={https://arxiv.org/abs/2407.10953},
}