Multilingual RAG Knowledge Agent for a Japanese Manufacturing Company
01 — THE CHALLENGE
02 — THE SOLUTION
PGROONGA TOKENBIGRAM
How Japanese text without word boundaries becomes searchable
01
Document text
02
Bigram index Overlapping 2-character fragments
03
Search query
03 — ARCHITECTURE
LIGHTRAG KNOWLEDGE GRAPH — EXAMPLE DOMAIN MANUFACTURING
QUERY
品質管理 → 海外工場の検品報告
3 entities, 2 relations — path resolved in under a second
04 — RESULTS
TECH STACK
OUTLOOK
NEXT STEP
A similar project?
GET IN TOUCH ALLE CASE STUDIESFrequently Asked Questions — Parfun RAG Agent Case Study
How does Kuroko Labs’ multilingual RAG search work?+
A three-stage pipeline: a domain-aware Query Planner breaks every question down into a structured search plan with multilingual keywords and product-code filters. Up to seven parallel search channels — pgvector (semantic), PGroonga TokenBigram (Japanese full-text search), LightRAG (knowledge graph) as well as tabular, email, entity and Kanban search — return results that are consolidated via Reciprocal Rank Fusion and given a final score by Cohere Rerank v3.5. Claude then synthesizes a cited answer from them — in Japanese, English, Thai or Burmese.
Which document formats are supported?+
PDF (including scanned documents via Mistral OCR, up to 35 MB), DOCX, PPTX, XLSX and CSV. Excel files receive special handling: a Tabular Deep Dive automatically detects header rows, finds relevant data rows via full-text search and extends the context with neighbouring rows. The sources are SharePoint (via Microsoft Graph delta sync, including mirrored access rights), email mailboxes with a two-stage relevance filter, automatically ingested meeting transcripts and browser upload.
Why does the system orchestrate five different AI providers?+
Each provider excels in a specific role: gpt-4o-mini as a fast Query Planner with product-code detection, OpenAI Embeddings for vectorization, Cohere Rerank v3.5 for semantic relevance optimization, Mistral OCR for high-quality document parsing, and Claude for tier-based synthesis — Haiku 4.5 for flash to standard queries, Sonnet 4.6 for deep multi-document analyses. A single model could not cover all of these requirements equally well.
Can Kuroko Labs build a similar RAG solution for my company?+
Yes — the architecture is modular and transferable to any industry and document landscape. Whether legal documents, technical manuals or sales knowledge: Kuroko Labs analyses your information flows and builds a tailor-made RAG pipeline. The initial consultation and potential analysis are free of charge.
How is the quality of the answers ensured?+
Through an automated self-test with 100 scenarios across eight categories that runs as a real user against the live pipeline (current overall score: 0.9 out of 1.0). A Learned Facts Feedback Loop extracts user corrections and injects them into future syntheses. Guardrails block sensitive topics, hash-based deduplication prevents redundant processing, and an Anti-Loop Session State ensures that follow-up questions deliver new results.