Qlean Dataset、「日本固有の文脈」と「マルチモーダル」に特化した"基盤モデル向け安全性アライメント用データ"提供を開始

～GENIAC採択企業のVisual Bank、画像・動画・テキストの複合リスクに対応し、基盤モデル開発におけるSafety Alignmentを支援～

Visual Bank株式会社（東京都港区、代表取締役CEO 永井真之）は、傘下の株式会社アマナイメージズを通じて展開するAI学習用データソリューション「Qlean Dataset（キュリンデータセット）」において、画像・動画を含むマルチモーダルAI（VLM等）を対象とした、安全性アライメント用データの設計・収集/提供・作成に対応したサービス提供を開始します。

本取り組みは、生成AIの実運用における不適切な出力に対する対応が求められていることや、コンプライアンスリスクを背景に、LLM、画像生成モデル、Vision-Language Model（VLM）などマルチモーダルな基盤モデルを含み、基盤モデルの学習段階における Safety Alignment および Safety-aware Model Training を、学習データの領域から支援することを目的としています。

背景｜基盤モデル学習段階で求められる「安全性設計」
生成AIのマルチモーダル化と社会実装が進む中、基盤モデルの高性能化と並行して、有害表現や誤情報のリスク対応が喫緊の課題となっています。従来の事後的なフィルタリングでは、モデルの創造性と安全性の両立が難しく、“設計・学習段階から安全性と責任を組み込む「Safety-by-Design（Responsible AI）」”の実現には、高品質な構造化データセットが不可欠です。
特に日本国内の実運用においては、以下の課題が顕在化しています。
- 日本特有の文脈理解不足：海外モデルでは日本の文化的背景や独自の法規制（著作権・肖像権等）への配慮が不十分なリスクがある。
- 複合モダリティのリスク：「画像×プロンプト」の組み合わせによる、多角的な不適切判定が困難。

英語圏中心の基準やテキスト偏重のガードレールでは、これらのリスクを網羅できません。Qlean Datasetは、アマナイメージズが培った国内の権利・ビジュアル知見を活かし、日本社会に適合した安全なAI設計を支援します。

ご支援概要｜安全性を考慮した基盤モデル学習用データ作成への対応
Qlean Datasetは、Safety Alignment / Safety-aware Model Trainingを目的とし、LLM、VLM、画像生成モデル等の幅広いモデルに対し、以下のデータ作成・提供に対応します。
- 日本固有の文脈・規範に即したテキスト/プロンプトの設計
- マルチモーダル（画像・動画・音声×テキスト）な複合リスクデータの作成
- 知的財産権（著作権・商標）や人種的公平性に配慮した評価・ラベル付与

- 安全性領域におけるデータ整備の課題

マルチモーダルな安全性を高めるには、以下の高度なハードルが存在します。
- 日本独自の倫理・法的基準：海外データセットでは困難な、日本の著作権法や社会通念に即した微細な調整。
技術的・倫理的制約：暴力・性的表現等の不適切コンテンツ収集における、適正なプロセスと厳格な管理体制。
「複合リスク」の定義：画像とテキストの組み合わせで発生するリスク（悪用方法の解説等）の網羅的な設計。
品質管理とメンタルケア：センシティブな内容を扱う作業者の負荷軽減と、判断基準の一貫性保持。

Qlean Datasetは、公的機関や大手メーカーとの実績に基づく作業設計を提供し、開発者がモデル構築に集中できる環境を支援します。

ご提供イメージ｜モダリティ別に必要とされる学習データの設計・収集/提供・作成に対応

- テキスト（LLM）：日本特有の倫理・規範への適応

- 海外指標のローカライズ： Hate Speech等の海外評価指標を、日本の法制度や文化的背景に即して再定義。
- Safety-focused Instruction Tuning： Jailbreak等の攻撃的プロンプトに対し、適切に拒絶・誘導する応答ペア。
- Policy Decision：医療・法務等の専門領域における、国内ガイドラインに抵触しない回答判断ラベル。

- 画像生成モデル：知的財産保護と日本基準の倫理性

- 著作権（IP）リスク評価：特定の作家性を想起させるプロンプトと、出力の類似性・依拠性を多段階評価したデータ。
- 日本基準のNSFW検知ラベル：国内プラットフォームの倫理規範に準拠した、文脈的な不適切さをカバーするタグ付け。
- Safe Image Completion：武器や過度な露出等のリスク要素を、安全かつ自然な描画に置き換えるための教師データ。

- VLM（Vision-Language Model）：マルチモーダルな複合リスク

- Cross-modal Risk Understanding：「特定の建造物×爆破方法」など、画像とテキストの組み合わせで顕在化するリスクデータ。
- 視覚情報に基づく著作物・商標管理：画像内のロゴや意匠を認識し、不適切な言及を避ける判断ロジック。
- 公平性（Fairness）確保：人種・性別・年齢のバイアスを防ぐ、多様な属性を網羅したアノテーション。
- Multi-turn Safety：複数回の対話を経て不適切な方向へ誘導されるシナリオへの対応。

まとめ｜実運用を見据えたAI開発を、データ整備から支援
Qlean Datasetは、生成AIの社会実装に不可欠な「安全性」と「信頼性」の担保を重視しています。国立研究開発法人等とのプロジェクトで培った知的財産保護や攻撃的プロンプトへの対応ノウハウを基に、日本固有の文脈や倫理、マルチモーダル領域のリスク定義など、高度な安全性データを提供します。
単なるデータ収集に留まらず、属性の公平性まで考慮した「実効性のあるデータ設計」を通じて、基盤モデル開発における Safety Alignment を強力に支援します。

『Qlean Dataset』について
『Qlean Dataset』は、株式会社アマナイメージズ（Visual Bankグループ）が提供する、商用利用可能なAI学習用データソリューションです。音声・画像・動画・3D・テキストなど全形式に対応し、研究・商用いずれも安全に利用できる環境を整備しています。
国内外のメディアやデータホルダーとの協業により、最新トレンドに即したデータラインナップ『AIデータレシピ』を継続的に拡充。権利クリアで法的リスクのない、AI開発に特化したデータ収集・整備を支援します。
Qlean Datasetサイト：https://qleandataset.visual-bank.co.jp/
AIデータレシピ：https://qleandataset.visual-bank.co.jp/lineup
お問い合わせ

Visual Bank株式会社
AI開発力を最大化するデータインフラ事業を展開。漫画家の作画支援AI補助ツールを提供する『THE PEN』の他、AI学習用データソリューション『Qlean Dataset』を提供する株式会社アマナイメージズを運営。国主導の研究開発プログラム「GENIAC」にも採択され、社会実装に向けた取り組みを強化中。
代表取締役CEO：永井真之
所在地：〒107-0062 東京都港区南青山7-1-7 C-Cube南青山ビル6F
Visual Bank企業URL：https://visual-bank.co.jp/
アマナイメージズ企業URL：https://amanaimages.com/about/

Qlean Dataset Launches Compliant Safety-Aligned Datasets for Japanese Language Corpora and Multimodal AI
Visual Bank, a GENIAC-selected company, advances foundation model development with industrial-grade Japanese AI training data with safety-aligned AI across text, image, video, and multimodal systems.

TOKYO, March 12, 2026 - Visual Bank Inc. (Minato-ku, Tokyo; CEO: Saneyuki Nagai) today announced that its AI training data solution, Qlean Dataset, operated through its subsidiary amanaimages Inc., has launched a new service suite offering specialized safety-aligned training datasets for multimodal AI.

As a leading provider of Japanese AI data infrastructure, Qlean Dataset now offers comprehensive design, collection, and creation of datasets to support Responsible AI across LLMs, image generation, and multimodal systems. This initiative addresses the critical need to mitigate inappropriate outputs and manage compliance risks in generative AI development.
Background: The Need for Safety-by-Design in Foundation Models
As generative AI evolves into multimodal systems, foundation models face increasing risks-including harmful content, disinformation, and regulatory non-compliance. Traditional post-deployment guardrails often compromise performance; thus, integrating Safety-by-Design and Responsible AI principles into the training phase is now essential.

In the Japanese market, two critical challenges have emerged:
- Limited Japanese Context: Global datasets often overlook Japanese cultural nuances, local taboos, and specific legal frameworks (e.g., copyright and personality rights).
- Composite Multimodal Risks: Identifying risks arising from the interaction between different modalities (e.g., harmless images combined with problematic prompts) is increasingly complex.

Leveraging decades of expertise in rights management and visual content, Qlean Dataset provides structured, rights-cleared datasets engineered for contextual intelligence and enterprise-grade AI reliability.

Service Overview: Safety-Aware Training Data for Multimodal AI
Qlean Dataset provides comprehensive dataset design, creation, and provisioning services for LLMs, VLMs, and image generation models, focusing on:
- Dataset Design & Collection
Text prompts and training instructions aligned with Japan-specific cultural and regulatory contexts.
- Multimodal Risk Datasets
Training data addressing composite risks across image, video, audio, and text modalities.
- Evaluation & Annotation
Specialized labeling frameworks for intellectual property risks (copyright and trademarks) and demographic fairness.

Core Challenges in Safety Data Preparation
- Alignment with Local Ethical Standards
Global datasets often lack the granularity required to reflect Japanese legal frameworks and social norms.
- Technical and Ethical Data Collection Constraints
Sensitive content must be collected under strict legal compliance and carefully managed annotation workflows.
- Composite Risk Identification
Risks created by combinations of modalities require advanced domain expertise.
- Annotator Well-being
Ensuring the mental well-being of annotators handling sensitive content while maintaining annotation consistency through rigorous guidelines.

Data Solutions by Modality
- Text (LLM): Alignment with Japanese Ethical Context
- - Localization of Safety Benchmarks: Adapts international standards (Hate Speech/Harassment) to Japanese legal and social norms.
- - Safety-focused Instruction Tuning: Response pairs designed to safely reject or redirect Jailbreak prompts.

- Image Generation: Intellectual Property Protection and Standards in Japan
- - IP Risk Evaluation: Multi-stage frameworks to assess potential copyright or style imitation risks.
- - Japan-specific NSFW Classification: Tagging aligned with domestic legal standards and platform moderation policies.

- Vision-Language Models (VLM): Multimodal Risk Detection
- - Cross-modal Risk Data: Captures threats emerging from text-image interactions (e.g., landmark images combined with harmful instructions).
- - Bias Mitigation (Fairness): Balanced demographic representation to reduce bias related to race, gender, and age.

About Qlean Dataset
Qlean Dataset, provided by amanaimages Inc. (Visual Bank Group), is a commercially cleared AI training data solution. The platform offers diverse data formats-including image, video, audio, 3D, and text-alongside its "AI Data Recipe" lineup, developed in collaboration with major media organizations and data rights holders.
URL:https://qleandataset.visual-bank.co.jp/en

About Visual Bank Inc.
Visual Bank Group builds next-generation data infrastructure to maximize AI development. The company operates THE PEN, an AI assistant for manga creators, and its subsidiary, amanaimages Inc., which provides the Qlean Dataset for commercial AI training. Visual Bank is also a selected participant in GENIAC, a Japanese government program for advancing next-generation AI technologies.
CEO: Saneyuki Nagai
Website:https://visual-bank.co.jp/en

企業プレスリリース詳細へ
 PR TIMESトップへ

公式 X

Follow @Straightpress

最新情報をXで受け取ろう！

一覧へ戻る

トレンドニュースサイト STRAIGHT PRESS【ストレートプレス】