// LLM Researcher · Tokyo

Chengguang Gan

LLM Researcher @ Techtouch · Ph.D. in Informatics

~/research $ focus --now ▋

I build and study large language models for information extraction, web agents, and the Mutual Reinforcement Effect — and I ship the datasets and models to back it up.

View Publications Get in touch

Tokyo, Japan

// 01

About

I’m an LLM Researcher at Techtouch in Tokyo. I earned my Ph.D. in Informatics in March 2025 from Yokohama National University, advised by Prof. Tatsunori Mori. My work spans natural language processing and large language models — with a focus on information extraction, prompting, multimodal IE, and web agents.

I introduced the Mutual Reinforcement Effect and built the Japanese (and later multilingual) IE Mix datasets — covering sentence, text, sentiment and POS classification, relation and event extraction — then fine-tuned a line of open LLMs on top of them. Everything ships: you can grab the papers, code, datasets, and models on Hugging Face and GitHub.

Large Language Models
Information Extraction
Prompting
Mutual Reinforcement Effect
Multimodal IE
Web Agents
Japanese NLP

Reviewer for NeurIPS, COLING, and COLM.

// 02

News

2026New preprint — GRPO Web Agent: a controlled study of learning-rate-gated failure in small language and vision-language models.
2026New preprint — MAG: a benchmark and harness for multimodal action and guide generation.
2026MMM accepted to ACL 2026 Findings.
2026New preprint — GuideWeb: a benchmark for automatic in-app guide generation on real-world web UIs.
2025Joined Techtouch as an LLM Researcher.
2025GIELLM at IJCNN 2025; USA at PACLIC 2025; two code-generation papers at ICIC 2025.
2025Received my Ph.D. in Informatics from Yokohama National University.
2024II-Bench accepted to NeurIPS 2024 (Datasets & Benchmarks Track).

// 03

Publications

Bold = me. Citation counts via Google Scholar (Jul 2026).

2026

A Learning-Rate-Gated Failure of GRPO in a Small Language and Vision-Language Model Web Agent: A Controlled Null and Its Mechanism

Chengguang Gan, Zhixi Cai, Yunhao Liang, Hanjun Wei, Shiwen Ni, Qinghao Zhang

Preprint arXiv
2026

MAG: A Web-Agent Benchmark and Harness for Multimodal Action and Guide Generation

Chengguang Gan, Hanjun Wei, Yunhao Liang, Zhixi Cai, Qinghao Zhang, Shiwen Ni

Preprint arXiv
2026

A Multilingual Dataset and Empirical Validation for the Mutual Reinforcement Effect in Information Extraction

Chengguang Gan, Sunbowen Lee, Qingyu Yin, Yunhao Liang, Xinyang He, Hanjun Wei, Younghun Lim, Shijian Wang, Hexiang Huang, QingHao Zhang, Shiwen Ni, Tatsunori Mori

ACL 2026 Findings ACL Anthology
2026

GuideWeb: A Benchmark for Automatic In-App Guide Generation on Real-World Web UIs

Chengguang Gan, Yoshihiro Tsujii, Yunhao Liang, Tatsunori Mori, Shiwen Ni, Hiroki Itoh

Preprint arXiv
2025

GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effects

Chengguang Gan, Qinghao Zhang, Tatsunori Mori

IJCNN 2025cited by 16 IEEE
2025

USA Model: Japanese Universal Sentiment Analysis Model & Construction of Japanese Sentiment Text Classification and Part-of-Speech Dataset

Chengguang Gan, Qinghao Zhang, Tatsunori Mori

PACLIC 2025 ACL Anthology
2025

RECODE: Leveraging Reliable Self-generated Tests and Fine-Grained Execution Feedback to Enhance LLM-Based Code Generation

Yunhao Liang, Ruixuan Ying, Takuya Taniguchi, Chengguang Gan, Zhe Cui

ICIC 2025 Springer
2025

Exploring Behavior-Driven Development for Code Generation

Yunhao Liang, Chengguang Gan, Ruixuan Ying, Zhe Cui

ICIC 2025 Springer
2025

Retrieval and Distill: A Temporal Data Shift-Free Paradigm for Online Recommendation System

Lei Zheng, Ning Li, Chengguang Gan, Yong Yu, Weinan Zhang

APWeb-WAIM 2025 Springer
2025

M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction

Chengguang Gan, Zhixi Cai, Yanbin Wei, Yunhao Liang, Shiwen Ni, Tatsunori Mori

Preprint arXiv
2025

Decoding Prokaryotic Whole Genomes with a Product-Contextualized Large Language Model

Shiwen Ni, Sheng Li, Shijian Wang, Xin Bi, Yang Li, Chengguang Gan, et al.

bioRxiv bioRxiv
2024

Application of LLM Agents in Recruitment: A Novel Framework for Resume Screening

Chengguang Gan, Qinghao Zhang, Tatsunori Mori

Journal of Information Processingcited by 151 J-STAGE
2024

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, et al.

NeurIPS 2024cited by 21 Proceedings
2024

Think from Words (TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification

Chengguang Gan, Qinghao Zhang, Tatsunori Mori

NLDB 2024 Springer
2024

Empirical Study of Mutual Reinforcement Effect and Application in Few-shot Text Classification Tasks via Prompt

Chengguang Gan, Tatsunori Mori

Preprint arXiv
2024

Demonstrating Mutual Reinforcement Effect through Information Flow

Chengguang Gan, Xuzheng He, Qinghao Zhang, Tatsunori Mori

Preprint arXiv
2023

Sensitivity and Robustness of Large Language Models to Prompt Template in Japanese Text Classification Tasks

Chengguang Gan, Tatsunori Mori

PACLIC 37cited by 48 ACL Anthology
2023

A Few-Shot Approach to Resume Information Extraction via Prompts

Chengguang Gan, Tatsunori Mori

NLDB 2023cited by 16 Springer
2023

Sentence-to-Label Generation Framework for Multi-task Learning of Japanese Sentence Classification and Named Entity Recognition

Chengguang Gan, Qinghao Zhang, Tatsunori Mori

NLDB 2023cited by 12 Springer
2023

Mutual Reinforcement Effects in Japanese Sentence Classification and Named Entity Recognition Tasks

Chengguang Gan, Qinghao Zhang, Tatsunori Mori

Preprint arXiv
2023

Construction of English Resume Corpus and Test with Pre-trained Language Models

Chengguang Gan, Tatsunori Mori

ANLP 2023 arXiv
2021

英文履歴書データ抽出システムへの BERT 適用性の検討

Chengguang Gan, Yoshihide Takahashi

IPSJ Kansai 2021 IPSJ

// 04

Experience & Education

Experience

2025 — Present

LLM ResearcherTechtouch
2025

Data ScientistGeneric Solution
2024 — 2025

Part-time ResearcherNII Large Language Model Center
2024

Research Part-timerRIKEN AIP
2022 — 2023

Research AssistantYokohama National University

Education

2022 — 2025

Ph.D. in InformaticsGraduate School of Environment and Information Sciences, Yokohama National University
2020 — 2022

M.S. in Information TechnologyThe Kyoto College of Graduate Studies for Informatics
2014 — 2018

East China Jiaotong UniversityNanchang, China

// 05

Open Source

Datasets, models, and code behind the papers — free to use.

Dataset 139 · 15

resume_seven_class

~78k-entry resume sentence-classification dataset across seven section types. My most-downloaded release.

Dataset 58

MMM Multilingual Test Set

Trilingual (EN / JA / ZH) test set for the Mutual Reinforcement Effect IE benchmark — ACL 2026 Findings.

Model7B · JA/EN/ZH

Yoko-7B-Japanese

LLaMA2-based Japanese LLM. The most-liked model on my Hugging Face profile.

Model8B · Instruct

OIELLM-8B

Open-domain information-extraction LLM (LLaMA3-8B) for trilingual NER & IE — the model behind MMM.

Model7B

GIELLM-7B

General Information Extraction LLM behind the GIELLM / Mutual Reinforcement Effect paper (IJCNN 2025).

CodeHub repo

Mutual Reinforcement Effect

The home for my flagship research line — USA, GIELLM, OIELLM and the MMM mix datasets, all linked in one place.

Beyond Research

When I’m away from the terminal.

Gaming

Mainly a AAA gamer — big-budget, single-player worlds are how I unwind, and honestly the same curiosity that got me into taking models apart.

Steam profile

Outdoors

I ski all through winter; once the snow melts I’m out climbing mountains and on long hikes. Fresh air, big views.

Photography

Landscape photography and urban street shooting — the header shot is one of mine, somewhere up in the hills.

500px gallery

// 07

Get in touch

Open to research collaborations and roles in LLMs, NLP, and agents. Best reached by email — or find me on the platforms below.

ganchengguan@yahoo.co.jp