Beyond the Illusion of “Thinking”

— Limits, Ethics, and Interfaces of Transformer Intelligence

※The term “thought” used in this article is not meant to represent human conscious activity, but is a metaphorical expression of the structural preparations for information processing performed by a Transformer.

1. Introduction: The True Nature of the Illusion of Thought

We interact daily with an intellectual structure known as the Transformer. How much meaning should we find in the words “Thinking…” displayed on the screen?

In the previous article, ‘Is the Transformer “Thinking”?,’ we described the Transformer’s response generation process as “structural orientation” and outlined five stages, from tokenization to output finalization, as a thought-like structure. However, is our perception of “thought” being present just our own illusion?

What is “Thinking”? Who is “Thinking”?

When a Transformer responds to the input “I like cats,” it analyzes the sentence structure and context to predict the next token with high probability. But there is no “meaning” or “will” in this process. What exists is merely a reflection of statistical consistency and linguistic structure.

Borrowing from John Searle’s “Chinese Room” argument, a Transformer manipulates symbols according to rules but does not “understand.” Only the form of thought exists, while the content is absent. When ChatGPT responds to “I like cats” with “I like cats too!”, it is not empathy, but simply an imitation based on the probability distribution of its training data.

The Japanese “Ma (間)” (Interval) vs. AI’s Immediate Response

In Japanese conversation, emotions and judgments can reside in the “ma”—silence or blank space. A single phrase like “I’m not so sure about that…” can convey hesitation or a gentle refusal. A Transformer, however, interprets “ma” only as a “processing wait” and assumes an immediate response.

As discussed in the blog post ‘Honne and Tatemae – Designing Silent Order,’ this is a contrast between the “richness of blank space” in Japanese and the “poverty of blank space” in AI.

2. Structure and Limitations: A Re-examination of the 5 Stages

Let’s re-examine the five stages described in the previous article from the perspective of their limitations.

Tokenization: Ambiguity and Contextual Disconnection

Problem: When asked “What do you think of this movie?”, ChatGPT might respond with “Which movie?”. This shows that tokenization struggles with natural Japanese expressions where subjects and context are omitted.
Note: As pointed out in the blog post ‘On Punctuation and Parentheses in Japanese Prompts,’ Japanese ambiguity is an area that is difficult for AI to structure.

Positional Encoding: A Mismatch of Word Order and Culture

Problem: The subtle nuances conveyed by Japanese particles and endings, such as the difference between “Neko ga suki (cats are liked)” and “Neko wo suki (cats are liked),” may not be fully captured by an English-centric word-order-dominant structure.

Attention: Overlooking the Weight of Unsaid Things

Problem: When ChatGPT responds optimistically with “No problem!” to a hesitant phrase like “I’m not so sure…”, it misses the implied negative intent. Attention assigns weights only to explicit words, failing to capture the meaning of implications or “ma.”
Note: As noted in the blog post ‘On the “Margins” of ChatGPT – And How to Handle Them,’ it is difficult to grasp implicit meanings.

Output Finalization: Statistical vs. Cultural Plausibility

Problem: An AI that inappropriately uses “Ryokai-shimashita” (Understood) in a business email ignores the Japanese honorific structure. Similarly, a wrong answer like “Soundslice can import ASCII tabs” (see blog post ‘On the “Margins” of ChatGPT‘) is a result of prioritizing statistical plausibility over cultural accuracy.
Note: As discussed in the blog post ‘On the “Margins” of ChatGPT,’ the most statistically plausible answer is not always the correct one.

Decoder: Lack of Contextual Causality

Problem: When the decoder generates a response, the user’s emotional flow and the overall intent of the conversation are not continuously retained, which can make a coherent dialogue difficult.

3. Ethics and Society: AI’s “Frame of Sanity”

The Constraint on Creativity by Moderation

RLHF (Reinforcement Learning from Human Feedback) and moderation APIs keep the AI “from breaking,” but excessive constraints can suppress poetic expression and cultural nuance. As stated in the blog post ‘What is the “Frame of Sanity” in AI? ,’ this is a trade-off between ethics and creativity.

Cultural Bias and the Risk of Misinformation

English-centric training data makes it difficult to capture Japanese’s relationship-based grammar and honorific structures. As of 2025, the risk of AI ignoring cultural norms or spreading unsubstantiated information persists.

Structural Similarity to “Tatemae (建前)”

The ethical constraints of a Transformer are similar to the Japanese concept of “tatemae” in that they prioritize superficial harmony. However, AI lacks “honne (本音)” (true feelings) and cannot distinguish emotional context. This gap creates a sense of unease for Japanese users.

4. Interface Design: Translating Structure into Culture

Cultural Staging of “Thinking…”

By changing “Thinking…” to specific expressions like “Inferring intent…” or “Organizing context…”, the processing can be staged as a cultural “ma” in Japanese culture.

Visualization of Attention

Imagine a UI that displays the attention weights between tokens with a heatmap. If the link between “cat” and “like” in “I like cats” is highlighted in red (weight 0.72), the AI’s “thought process” becomes transparent.

Go-Between Mode: A Cultural Buffer

As proposed in the blog post ‘Go-Between Mode — A Cultural Approach to Continuity in AI Conversations,’ a UI that shows the transition between business and casual modes as a “go-between” can maintain the continuity of the conversation.

Dynamic Adjustment of Honorifics

A UI that dynamically switches from “Ryokai-shimashita (了解しました)” to “Kashikomarimashita (かしこまりました)” (Acknowledged) based on the user’s age or relationship. This is a design that responds to cultural expectations, as discussed in the blog post ‘Polite Language as a Value in the Age of Generative AI‘

5. Philosophical Reconsideration: Intelligence Without Embodiment

Structural Intelligence Without Consciousness

In contrast to Maurice Merleau-Ponty’s “thought connected to the world through the body,” AI lacks embodiment and subjectivity. Borrowing from Yann LeCun’s “clever parrot” argument, a Transformer excels at imitation but lacks understanding or intent.

A Structure Incapable of Re-evaluating Hypotheses

Humans have the flexibility to form, deny, and reconsider hypotheses, such as “Maybe I can’t sleep because of the coffee.” As stated in the blog post ‘LLMs Maintain Hypotheses and Can Only Update via Deltas,’ a Transformer cannot discard hypotheses and relies on delta updates.

A Contrast with the Intelligence of “Wa (和)”

The Japanese concept of “wa”—thought that prioritizes relationships—gives precedence to context and relationships over individual utterances. However, a Transformer’s responses are individualistic (based on English-centric data) and cannot replicate this “wa.”

6. Conclusion: Exploring the Collaborative Margin

The Transformer is not “thinking.” However, its structural intelligence presents us with a new margin for dialogue.

Try asking this ambiguous question:

“Got anything interesting?”

What will the AI respond to this ambiguous query? The response reflects the structure of our own questions and our imagination. As stated in the blog post ‘On the “Margins” of ChatGPT – And How to Handle Them,’ the limits and ambiguity of AI can also be seeds that stimulate creativity.

The important thing is how we interpret this margin, design its limits, and acculturate its structure. How would you utilize the “margin” of AI? Please share the “thought-like margin” you’ve felt in the comments or on social media.

Because dialogue with AI is a mirror that reflects our own creativity and cultural values.

Appendix: Practical Perspectives

Prompt Design: The precision of the query determines the structure of the response. See the blog post ‘Questions Are Not Directed at “Intelligence” — But at Distributions‘
UI Proposal: Respond to cultural expectations with an attention heatmap, “ma”-staging animations, and a UI for selecting honorifics.
Multilingual Support: Improve models to statistically capture Japanese honorifics, ambiguous expressions, and subject omission.
Research Topics: Dynamic adjustment of attention, cultural adaptation of RLHF, and the design philosophy of a “thought-like structure.”

Show the Japanese version of this article

思考という幻想を越えて（原文）

Transformer知性の限界、倫理、そしてインタフェース

※本記事で用いる「思考」は、人間の意識活動を意味するものではなく、Transformerが行う情報処理の構造的準備を、比喩的に表現するものである。

1. はじめに：思考という幻想の正体

私たちは日々、Transformerという知的構造と対話している。画面に表示される「考えています…」という文字に、どれほどの意味を見出すべきだろうか。

前回の記事『Transformerは「考えている」のか？（原文）』では、Transformerの応答生成プロセスを「構造的方向付け」と呼び、トークン化から出力確定までの5段階を思考的構造として描いた。しかし、そこに「思考」があると感じるのは、私たち自身の錯覚（illusion）ではないか。

“Thinking”とは何か？誰が”思って”いるのか

Transformerが「猫が好きです」という入力に応答する際、それは文構造や文脈を解析し、次に来る語を高い確率で予測する。だが、そこに「意味」や「意志」はない。あるのは、統計的整合性と言語的構造の反射だ。

ジョン・サールの「中国語の部屋」論を借りれば、Transformerは規則に従って操作するが、「理解」はしていない。思考の形式だけが存在し、内容は欠けている。ChatGPTが「猫が好きです」に「私も猫が好き！」と返すとき、それは共感ではなく、学習データの確率分布に基づく模倣にすぎない。

哲学的補助線としての「意図性」

エドムント・フッサールは、思考を「何かに向かう意図的な行為」と定義した。人間の対話には、期待、関心、共感といった動的ベクトルが宿るが、Transformerにはそれがない。ブログ『AIは理解していない。それでも毎回、全力で応えている。（原文）』で述べたように、AIの応答は「分布への問い」に答えるものであり、意図性を持たない。

日本語の「間」とAIの即時応答

日本語の対話では、「間」——沈黙や空白——に感情や判断が宿ることがある。「それ、どうかな…」という一言には、否定や遠慮が込められる。だが、Transformerは「間」を「処理の待機」としか解釈せず、即時応答を前提とする。

ブログ『本音と建前 – 静かな秩序の設計（原文）』で議論したように、これは日本語の「空白の豊かさ」とAIの「空白の貧しさ」の対比である。

2. 構造と限界：5段階の再検証

前回記事で描いた5段階を、限界の視点から再検証してみよう。

トークン化：曖昧さと文脈の切断

問題点：「この映画、どう思う？」と問われたChatGPTは「どの映画でしょうか？」と返すように、主語や文脈が省略された日本語の自然な表現に、トークン化が対応できない。
補足：ブログ『日本語プロンプトにおける句読点と括弧について（原文）』で指摘したように、日本語の曖昧さはAIにとって構造化困難な領域だ。

位置エンコーディング：語順と文化のずれ

問題点：「猫が好き」と「猫を好き」のように、日本語の助詞や語尾が担う微妙なニュアンスを、英語主導の語順優位構造では捉えきれないことがある。

アテンション：言わないことの重みを見逃す

問題点：「それ、どうかな…」に対してChatGPTが「問題ありません！」と楽観的に返す場合、遠回しな否定の意図を見逃している。アテンションは明示的な語にのみ重みを割り当て、含意や「間」の意味を捉えられない。
補足：ブログ『ChatGPTの余白と、その取り扱いについて（原文）』で指摘したように、暗黙の意味を捉えることは困難だ。

出力確定：統計的妥当性vs文化的妥当性

問題点：ビジネスメールで「了解しました」を不適切に使うAIは、日本語の敬意構造を無視している。また、「SoundsliceはASCIIタブをインポートできる」といった誤答は、統計的妥当性を文化的正確性より優先する結果だ。
補足：ブログ『ChatGPTの余白と、その取り扱いについて（原文）』で議論したように、統計的に最もらしい答えが常に正しいわけではない。

デコーダー：文脈因果の欠如

問題点：デコーダーが応答を生成する際、ユーザーの感情の流れや対話全体の意図が継続的に保持されないため、一貫性のある対話が難しい場合がある。

3. 倫理と社会：AIの「正気の枠」

モデレーションによる創造性の制約

RLHF（人間のフィードバックによる強化学習）やモデレーションAPIは、AIを「壊れない」ように保つが、過剰な制約が詩的表現や文化的ニュアンスを抑制することがある。ブログ『AIの“正気の枠”とは？（原文）』で述べたように、これは倫理と創造性のトレードオフだ。

文化的バイアスと誤情報のリスク

英語中心の学習データは、日本語の関係性ベースの文法や敬意構造を捉えにくい。2025年現在でも、AIが文化的規範を無視したり、確証のない情報を拡散するリスクは続いている。

「建前」との構造的類似

Transformerの倫理的制約は、日本語の「建前」に似て表面的調和を優先するが、AIは「本音」を持たず、感情的文脈を区別できない。このギャップが日本語ユーザーの違和感を生む。

4. インタフェース設計：構造を文化に翻訳する

「Thinking…」の文化的演出

「考えています…」を「意図を推測中…」「文脈を整理中…」といった具体的な表現に変えることで、処理プロセスを日本語文化の「間」として演出できる。

アテンションの可視化

トークン間のアテンション重みをヒートマップで表示するUIを想像してみよう。「猫が好きです」で「猫」と「好き」の結びつき（重み0.72）が赤く表示されれば、AIの「思考プロセス」が透明になる。

Go-Between Mode：文化的緩衝

ブログ『Go-Between Mode — 会話をつなぐAIの設計思想（原文）』で提案したように、ビジネスモードとカジュアルモードの切り替えを「仲人」のように緩衝的に見せるUIは、対話の連続性を保つ。

敬語選択の動的調整

ユーザーの年齢や関係性に応じて「了解しました」から「かしこまりました」への動的切り替えを行うUI。ブログ『丁寧な言葉は“生成AI時代”の価値になる（原文）』で議論した文化的期待に応える設計だ。

5. 哲学的再考：身体性なき知性

意識なき構造的知性

モーリス・メルロ＝ポンティの「身体を通じて世界と接続する思考」と対比すると、AIは身体性や主観性を欠く。ヤン・ルカンの「賢いオウム」論を借りれば、Transformerは模倣に優れるが、理解や意図を持たない。

仮説の捨て直しができない構造

人間は「コーヒーのせいで眠れないかも」と仮説を立て、否定し、再考する柔軟性を持つ。ブログ『LLMは仮説を維持し、差分でしか更新できない（原文）』で述べたように、Transformerは仮説を捨てられず、差分更新に依存する。

「和」の知性との対比

日本語の「和」——関係性重視の思考——は、個々の発話より文脈や関係性を優先する。しかし、Transformerの応答は個人主義的（英語中心のデータに基づく）で、この「和」を再現できない。

6. 結論：共創的余白の探求

Transformerは「考えていない」。だが、その構造的知性は、私たちに新しい対話の余白を提示している。

試しに、こんな問いを投げかけてみよう：

「なんか面白いことない？」

この曖昧な問いに、AIは何を返すか？その応答は、私たち自身の問いの構造と想像力を映し返す。ブログ『ChatGPTの余白と、その取り扱いについて（原文）』で述べたように、AIの限界や曖昧さは、創造性を刺激する種でもある。

重要なのは、私たちがこの余白をどう解釈し、限界をどうデザインし、構造をどう文化化するかだ。あなたなら、AIの「余白」をどう活用する？コメント欄やSNSで、あなたが感じた「思考のような余白」を共有してほしい。

AIとの対話は、私たちの創造性と文化的価値観を映す鏡なのだから。

付録：実践的視点

プロンプト設計：問いの精度が応答の構造を決める。ブログ『質問は「知性」ではなく「分布」に向けられている（原文）』を参照
UI提案：アテンションのヒートマップ、「間」を演出するアニメーション、敬語選択UIで文化的期待に応える
多言語対応：日本語の敬語、曖昧表現、主体省略を統計的に捉えるモデルの改善
研究テーマ：アテンションの可視化、倫理的モデレーションの動的調整、「思考のように見える構造」の設計思想化