// LLM Researcher · Tokyo
Chengguang Gan
LLM Researcher @ Techtouch · Ph.D. in Informatics
I build and study large language models for information extraction, web agents, and the Mutual Reinforcement Effect — and I ship the datasets and models to back it up.
About
I’m an LLM Researcher at Techtouch in Tokyo. I earned my Ph.D. in Informatics in March 2025 from Yokohama National University, advised by Prof. Tatsunori Mori. My work spans natural language processing and large language models — with a focus on information extraction, prompting, multimodal IE, and web agents.
I introduced the Mutual Reinforcement Effect and built the Japanese (and later multilingual) IE Mix datasets — covering sentence, text, sentiment and POS classification, relation and event extraction — then fine-tuned a line of open LLMs on top of them. Everything ships: you can grab the papers, code, datasets, and models on Hugging Face and GitHub.
Reviewer for NeurIPS, COLING, and COLM.
News
- 2026MMM accepted to ACL 2026 Findings.
- 2026New preprint — GuideWeb: a benchmark for automatic in-app guide generation on real-world web UIs.
- 2025Joined Techtouch as an LLM Researcher.
- 2025GIELLM at IJCNN 2025; USA at PACLIC 2025; two code-generation papers at ICIC 2025.
- 2025Received my Ph.D. in Informatics from Yokohama National University.
- 2024II-Bench accepted to NeurIPS 2024 (Datasets & Benchmarks Track).
Publications
Bold = me. Citation counts via Google Scholar (Jun 2026).
-
2026
-
2026
-
2025
-
2025
-
2025
-
2025
-
2025
-
2025
-
2025
-
2024
-
2024
-
2024
-
2024
-
2024
-
2023
-
2023
-
2023
-
2023
-
2023
-
2021
Experience & Education
Experience
-
2025 — PresentLLM ResearcherTechtouch
-
2025Data ScientistGeneric Solution
-
2024 — 2025Part-time ResearcherNII Large Language Model Center
-
2024Research Part-timerRIKEN AIP
-
2022 — 2023Research AssistantYokohama National University
Education
-
2022 — 2025Ph.D. in InformaticsGraduate School of Environment and Information Sciences, Yokohama National University
-
2020 — 2022M.S. in Information TechnologyThe Kyoto College of Graduate Studies for Informatics
-
2014 — 2018East China Jiaotong UniversityNanchang, China
Open Source
Datasets, models, and code behind the papers — free to use.
resume_seven_class
~78k-entry resume sentence-classification dataset across seven section types. My most-downloaded release.
MMM Multilingual Test Set
Trilingual (EN / JA / ZH) test set for the Mutual Reinforcement Effect IE benchmark — ACL 2026 Findings.
Yoko-7B-Japanese
LLaMA2-based Japanese LLM. The most-liked model on my Hugging Face profile.
OIELLM-8B
Open-domain information-extraction LLM (LLaMA3-8B) for trilingual NER & IE — the model behind MMM.
GIELLM-7B
General Information Extraction LLM behind the GIELLM / Mutual Reinforcement Effect paper (IJCNN 2025).
Mutual Reinforcement Effect
The home for my flagship research line — USA, GIELLM, OIELLM and the MMM mix datasets, all linked in one place.
More on Hugging Face and GitHub.
Beyond Research
When I’m away from the terminal.
Gaming
Mainly a AAA gamer — big-budget, single-player worlds are how I unwind, and honestly the same curiosity that got me into taking models apart.
Steam profileOutdoors
I ski all through winter; once the snow melts I’m out climbing mountains and on long hikes. Fresh air, big views.
Photography
Landscape photography and urban street shooting — the header shot is one of mine, somewhere up in the hills.
500px galleryGet in touch
Open to research collaborations and roles in LLMs, NLP, and agents. Best reached by email — or find me on the platforms below.
ganchengguan@yahoo.co.jp