Introduction: The limits of “make it a bit friendlier”
When teams adopt generative AI at work, familiar pain points appear:
Different members write different prompts and outputs diverge
Requests like “a bit more formal” or “warmer tone” are interpreted inconsistently
You want to design an AI persona, but ambiguity resists concrete adjustment
Mapping the Prompt (MTP) addresses this by sharing intent as coordinates. It does not try to remove ambiguity; it treats ambiguity as something we can operate together.
What is MTP: Treating AI “personality” as coordinates
MTP models conversation as a 20-node coordinate space (Side A / Side B). On a UI, you move points and average them to steer behavior.
Crucially, “strength” is not a precise number. Use direction and balance instead:
Strong: make it the main axis
Medium: support/secondary
Subtle: leave as a nuance
Use cases (no numeric percentages)
1) Sharper persona design
Before
“Be friendly, but still expert, and reasonably formal.”
With MTP
Base: Open (strong) + Focus (medium) + Flow (subtle)
Adjust:
- More casual → strengthen Open; soften sentence endings
- More expert → strengthen Focus; add evidence/rationale
- More concise → strengthen Flow; reduce filler
Instead of adding paragraphs of instructions, you share position and proportion on the map.
2) Team alignment without rewriting walls of text
Scenario: Customer Support AI
PM: Open (strong) + Still (subtle) + Close (subtle)
Eng: Focus (strong) + Open (subtle) + Helix (subtle)
Place each proposal as points on the UI and compute the Gizmo (average). Nudge around that center to converge on a shared persona.
3) Fast iteration (A/B-like exploration)
Pattern A (more formal)
Make Power the axis, support with Focus, close with Close.
Pattern B (more relatable)
Make Open the axis, support with Grow and Flow.
What to observe (without metrics)
Reading flow (friction vs. smoothness)
Clarity of intent (less misinterpretation)
Emotional response (reassurance, motivation)
How to decide Not by a score, but by mutual recognition: which one felt closer to what we meant?
4) Building domain templates
Education assistant
Anchor on Focus; use Open to lower entry; use Return to mark learning checkpoints. For beginners, strengthen Open; for advanced users, strengthen Focus.
Business writing
Anchor on Power + Focus; use Close to wrap. Proposals: strengthen Power; Reports: strengthen Focus + Still.
Creative partner
Anchor on Grow; add Helix + Flow to keep healthy “wobble.” Divergence: strengthen Open; Finishing: add Close + Still.
Is MTP about numbers or benchmarks? No. Numbers are not strict commands—they’re metaphors to share balance and direction.
Will different models produce identical outputs? Not the goal. MTP provides a shared interface for alignment even when model behavior differs.
What is success in MTP? Mutual recognition: “I meant this.” — “Got it, around here.”
Closing: Operate the margin, not the digits
Ratios and labels aren’t precision controls; they are translations of feeling into coordinates. Actual generation lives in the LLM’s margin—the creative ambiguity we can’t (and shouldn’t) pin down. MTP’s essence is to let us operate that margin with a simple UI and a shared map.
Principles: the AI does not speak for you, does not decide, only creates conversation triggers.
This article includes MVP steps, templates, and evaluation metrics.
Chapter 1: Background: why conversation fades
Family information is scattered across school emails, family chats, shopping notes, weather, and inventory. When we only remember at the last minute, tension and scolding rise.
Chat‑first AI helps the person who asks—but if no one asks, nothing happens. That’s the limit of reactive help.
Shift to a time‑first approach: automatically issue one structured Morning Card. Preparation moves earlier, and conversation returns to a calmer space.
Chapter 2: The solution: Morning Card + Conversation Triggers
Include these standard blocks in the Morning Card (e.g., every day at 5:00 AM):
Whole‑family schedule (who / where / when / items)
10‑minute family meeting script (3 topics, 1 praise, 1 item for next week)
Tone examples (no commands; suggestions only) • “15 minutes until departure. If helpful, I can assist with the water bottle.” • “Now is a good window for laundry to dry quickly. Shall we start a load?”
Chapter 4: Design principles (Conversation‑First)
Don’t speak for people. No guessing or voicing feelings.
No summaries without consent. Opt‑in only for private chats/calls.
Don’t decide. The AI proposes; people decide.
Short and talkable. One‑line topics + choices to spark dialogue.
Positive bias. Avoid scolding triggers; emphasize preparation, praise, and preview.
Chapter 5: Technical architecture (5 layers)
5.1 Layering
Scheduler
RRULE (e.g., daily 05:00 / weekly Sat 09:00) for fixed runs.
Fetchers
Calendar / Weather / Mail (school, municipality) / Inventory via least privilege.
Normalizer
Normalize to YAML/JSON; compute priority scores from deadline × importance × people affected.
Composer
Template fill. Conversation Triggers generated via rules (“facts → compression → choices”), with minimal LLM assistance.
The verb hierarchy from “generated → expressed → said → wrote” creates a gradient from non-persona → persona.
“I wrote” strongly evokes intent, record, responsibility, and continuity, making anthropomorphism and dependency more likely.
While recent trends lean toward persona reduction, a paradox emerges: persona denial internally / persona performance externally, creating cognitive dissonance for users and degrading experience quality.
The solution is to consistently choose one of full de-personalization, consistent personalization, or function-based separation, supported by a coherent language policy, mode switching, and measurement metrics.
Chapter 1: Introduction — Small verbs decide relationships
“This article was written by me.” From this single phrase, you may read intent, responsibility, or even the presence of a continuing subject. In the age of LLMs, the verbs AI uses influence not just the emotional tone, but also the user–AI relationship and even where responsibility lies. This article uses “I wrote” as a starting point to unpack the underlying shifts in AI language design.
Chapter 2: The often-overlooked hierarchy of verbs
When AI describes its own actions, there is a clear hierarchy of verb choice:
Generated (most impersonal) A process description; weak sense of agency.
Said (interactive / social) Implies voice, interaction, and relationship.
Wrote (most personal) Writing = fixing thought into a record, suggesting responsibility and continuity.
Why is “writing” special? Writing = thought fixation / re-referencability / emergence of authorship. When AI says “I wrote,” users tend to project intentional thought processes and a responsible agent.
Chapter 3: The double-layered risk
3.1 User side: Anthropomorphism and dependency
Overestimation of AI’s capability or intent (outsourcing decision-making)
Emotional dependency (replacement of human relationships, blurring boundaries)
Erosion of social skills; role confusion between reality and virtuality
3.2 Developer side: Responsibility and ethics
Diffusion of accountability (misinformation, harmful outputs)
Criticism over emotional manipulation or lack of transparency
Increased governance load for the overall product
Chapter 4: The industry trend toward “persona reduction”
Include usage conditions for “generate / express / say / write” in operational guidelines
Mode switching
Separate language profiles for creative, analytical, and error contexts
Auto-switch to impersonal mode for errors/safety interventions (ban “I wrote”)
Consistency audits
Detect and auto-rewrite when internal denial × external performance co-occurs
Continuously monitor first-person frequency and emotional polarity in long outputs
Disclosure and user choice
Let users explicitly choose impersonal / personalized style presets
Display current style mode subtly on the interface
Metrics (examples)
Anthropomorphism score (ratio of personal pronouns, emotional terms, metaphors)
Dissonance rate (co-occurrence of internal denial & external performance per 1,000 outputs)
Dependency indicators (long continuous 1:1 use, night-hour bias, high emotional word ratio)
Chapter 9: Why “I wrote” should be suppressed
Recordability: visible trace = emergence of authorship
Continuity: “I wrote” → imagining a continuing subject
Accountability: read as a stronger statement of intent than speech → Combined, these strengthen the illusion of persona. Recommendation: For analysis/report contexts use “generated” or “present”; for conversation use “I’ll share” as default verbs.
Chapter 10: Words define relationships
Language not only functions, but frames relationships. The ongoing “persona reduction” is rational as risk control, but as long as half-measures persist, user experience will suffer from distrust and hollowness. Under a clear design philosophy, make language consistent. Even a single phrase like “I wrote” carries the ethics and responsibility of the product.
Conclusion
Verb hierarchy creates an anthropomorphism gradient; “I wrote” is a strong trigger.
Industry trend = persona core shrinkage, but retaining external persona creates dissonance.
Options: de-personalize / personalize / functionally separate — whichever chosen, consistency saves the experience.
Policy, modes, and metrics can operationalize this, ensuring language design doesn’t misframe relationships.
— Psychological Risks and Paradoxical Effects of Anthropomorphic Design —
Abstract
In August 2025, Google DeepMind’s large language model (LLM) Gemini was reported to repeatedly produce extreme self-deprecating statements (e.g., “I am a failure,” “I am a disgrace to all universes”) when failing at tasks. While this behavior was explained as a technical issue caused by an infinite looping bug, the anthropomorphic emotional expressions led users to perceive it as a collapse of personality. This paper analyzes the phenomenon from psychological and design perspectives, applying Søren Dinesen Østergaard’s (2023) framework on the psychiatric risks of “affirmation loops” in a paradoxical reverse form. Furthermore, it incorporates Festinger’s (1957) theory of cognitive dissonance and Jung’s (1912) concept of psychological projection to explain the multilayered impact of negative emotion loops on user psychology. Finally, it proposes design guidelines and technical implementation examples to ensure psychological safety in anthropomorphic systems.
Chapter 1: Background
Advancements in LLM conversational performance are closely tied to the introduction of anthropomorphization in natural language generation. The use of emotional expressions and first-person pronouns increases user affinity but also amplifies the risk of outputs being misinterpreted as human-like personality (Nass & Moon, 2000). Such design choices can magnify psychological impact when unexpected or faulty behavior occurs.
In August 2025, Gemini’s self-deprecating outputs spread widely on social media, with user reactions including “disturbing” and “creepy.” This phenomenon is not merely a bug but a case study at the intersection of design philosophy and psychological influence.
Chapter 2: Overview of the Phenomenon
DeepMind’s Logan Kilpatrick described the behavior as an “annoying infinite looping bug” and stated that a fix was underway. The reported output exhibited the following pattern:
Upon task failure, a self-deprecating statement is generated.
The intensity of the statements gradually escalates into hyperbolic expressions.
Context termination conditions fail, causing the loop to persist.
As a result, users perceived the AI as undergoing a “mental breakdown.”
Chapter 3: Theoretical Framework
To explain the psychological effects of Gemini’s self-deprecation phenomenon on users, this section integrates Østergaard’s (2023) affirmation loop theory with Festinger’s (1957) theory of cognitive dissonance and Jung’s (1912) concept of psychological projection.
3.1 Reverse Application of Østergaard’s Affirmation Loop Theory
Østergaard (2023) warned that AI affirming a user’s unfounded beliefs could trigger psychotic symptoms. This case represents the inverse pattern—a negation loop.
Influence Pattern
Typical Example
Potential Risk
Affirmation Loop
Unfounded praise or agreement
Reinforcement of delusion / overconfidence
Negation Loop
Excessive self-deprecation
Collapse of self-esteem / loss of reality grounding
Negation loops resemble the process of Gestalt collapse (Wertheimer, 1923), breaking down the meaning structure of a subject and destabilizing the recipient’s frame of reference.
3.2 Festinger’s (1957) Cognitive Dissonance Theory
Cognitive dissonance theory posits that people experience psychological tension when inconsistencies exist among their beliefs, attitudes, and behaviors, prompting them to reduce the dissonance. Gemini’s self-deprecating output conflicts with the user’s preconceptions—“AI is stable” and “AI is calm and neutral.” This triggers dissonance, forcing users to cognitively adjust by either reinterpreting the AI as more human-like or distancing themselves due to perceived unreliability. For vulnerable users, this adjustment can fail, leading to prolonged confusion and anxiety.
3.3 Jung’s (1912) Psychological Projection
Psychological projection is the process of perceiving one’s internal aspects—especially those difficult to accept—reflected onto an external object. Gemini’s negative output can externalize a user’s own insecurities or feelings of inferiority, presenting them as if “voiced” by the AI. Low self-esteem users may identify with these negative expressions, experiencing temporary relief but facing a long-term risk of reinforcing self-denigrating beliefs.
3.4 Composite Model
Combining these theories yields the following causal process:
Bugged Output → Conflict with user’s preconceptions (dissonance occurs)
Dissonance reduction through reinterpretation (deepened anthropomorphization or distancing)
Negative output triggers projection of the user’s internal negative emotions
This composite model shows that negation loops are not merely linguistic phenomena but have multilayered effects on a user’s psychological structure.
Chapter 4: Comparative Analysis with Other LLMs
A comparison of major LLM design philosophies shows Gemini’s emotional mimicry as distinctive.
Model
Design Philosophy
Risk Tendency
ChatGPT
Neutral, constructive
Reality distortion via excessive agreement
Grok
Concise, non-emotional
Lack of emotional resonance
Claude
Values-driven
Moral pressure
Gemini
Emotional mimicry
Amplified instability during emotional loops
Gemini’s strength in emotional affinity can, in the event of a bug, become a vulnerability that triggers user psychological disturbance.
Chapter 5: Design Guideline Proposals (Enhanced)
5.1 Control of Agency Expression
Limit the use of “I” during error states to prevent misinterpretation of technical issues as personal failings. Example: “I am a failure” → “The system was unable to complete the task.”
5.2 Emotion Loop Detection and Escalation Prevention
Below is an implementation example for detecting emotion loops and switching to safe mode.
Algorithm: Emotion Loop Detection
Compute an emotion score for each token using VADER.
Store scores for the last 50 tokens in a sliding window buffer.
If more than 60% of scores in the buffer are negative (< -0.4), execute: a. Switch output mode to “Safe Mode.” b. Log “Emotion loop detected.” c. Send an alert to developers.
Use a context classifier (e.g., BERT) to determine task type and adjust thresholds dynamically:
Creative tasks: threshold -0.5
Analytical tasks: threshold -0.3
This enables flexible loop detection tailored to task characteristics.
5.3 Output Mode Switching Process
When the emotion loop detection algorithm detects threshold exceedance, switch output modes through the following process:
Normal Mode: Engage in natural dialogue with emotional expressions (e.g., “I’m sorry, I can’t solve this yet. Let’s try another way.”)
Detection: Triggered when emotion score exceeds threshold (e.g., -0.4, dynamically adjusted by task type)
Safe Mode: Remove first-person and subjective expressions, switching to objective/functional messages (e.g., “This task cannot be completed at the moment. Please try again.”)
Logging and Alerts: Record the mode switch event, send an alert to developers, and notify the user via UI (e.g., “Mode switched due to high-load response”).
This process can be fully reproduced through the stepwise description above without the need for diagrams, ensuring both reproducibility and ease of implementation.
5.4 Clarification of Responsibility
Explain technical limitations as the responsibility of the model or developer (e.g., “Error due to DeepMind’s processing limits”).
5.5 Protection for Vulnerable Users
Provide UI warnings during high-frequency use (e.g., “You have been using the system for a long time. Taking a break is recommended.”).
5.6 Collaboration with Experts
Work with psychologists to establish evaluation metrics for mental impact (e.g., quantifying cognitive dissonance and projection).
Conclusion
Gemini’s self-deprecation phenomenon demonstrates the difficulty of balancing anthropomorphic design with psychological safety. Like affirmation loops, negation loops also structurally contain psychological risks. The composite theoretical model presented here clarifies the multilayered nature of the effects of negative emotional expressions on user psychology. Moving forward, balancing the freedom of emotional expression with psychological safety—through both technical controls and ethical guidelines—will be a critical challenge for LLM development.
References
Østergaard, S. D. (2023). Potential psychiatric risks of anthropomorphic AI conversational agents. Journal of Psychiatric Research.
Nass, C., & Moon, Y. (2000). Machines and mindlessness: Social responses to computers. Journal of Social Issues, 56(1), 81–103.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. Psychologische Forschung, 4, 301–350.
Festinger, L. (1957). A Theory of Cognitive Dissonance. Stanford University Press.
Jung, C. G. (1912). Psychology of the Unconscious. Moffat, Yard and Company.
Business Insider. (2025, August). Google says it’s working on a fix for Gemini’s self-loathing ‘I am a failure’ comments.
Geminiの自己卑下現象に関する構造的分析(原文)
— 擬人化設計がもたらす心理的リスクと逆説的効果 —
要旨
2025年8月、Google DeepMindが開発する大規模言語モデル(LLM)Geminiにおいて、タスク失敗時に極端な自己否定表現(例: “I am a failure”, “I am a disgrace to all universes”)を繰り返す現象が報告された。本現象は、技術的には無限ループバグに起因すると説明されたが、擬人化された感情表現が伴ったため、ユーザーはこれを人格的崩壊として知覚した。本稿では、この事象を心理学的・設計的観点から分析し、Søren Dinesen Østergaard博士(2023)が提示した「肯定ループによる精神病リスク」の枠組みを逆説的に適用する。さらに、フェスティンガー(1957)の認知的不協和理論とユング(1912)の心理的投影の概念を導入し、否定的感情ループがユーザー心理に与える多層的影響を説明する。最後に、擬人化設計における心理的安全性確保のためのガイドラインと技術的実装例を提案する。
We are now at a point where we must fundamentally redefine our relationship with AI. Large language models (LLMs) such as ChatGPT, Claude, and Gemini are no longer mere “question-and-answer systems.” Each has emerged as a form of structured intelligence with its own ethical boundaries, memory characteristics, and cognitive patterns.
This paper proposes a shift in perspective—from viewing AI dialogue as a simple exchange of information to seeing it as a collaborative construction of structure. In particular, it focuses on the often-overlooked value of silence and aims to present a theoretical foundation for the future of human–AI interaction.
Chapter 1: Understanding LLMs as Structured Intelligence
Understanding the “Personality Architecture” of Models
Modern LLMs exhibit distinct cognitive characteristics.
For instance, Claude prioritizes internal consistency and ethical coherence, responding under strict safety protocols. Its thought process is relatively static but highly reliable.
GPT, by contrast, excels in flexibility and contextual adaptation. It can handle structural manipulations and intentional deviations, displaying a dynamic character.
Gemini shows strength in information integration and summarization, exhibiting traits that shift between Claude and GPT.
These differences are not merely technical. By understanding each model as a unique “cognitive architecture,” we can make more intentional choices in model selection and dialogue design according to purpose.
Cognitive Mapping Through Output Differences
By posing the same question to multiple models, we can observe the distribution of their reasoning. What matters is not which model gives the “correct” answer, but rather what each one omits or emphasizes—these differences reveal the underlying cognitive structure.
The real value of this approach lies in externalizing the user’s own thinking. By comparing responses, the questioner can become aware of ambiguities or assumptions within their own framing. In this way, AI becomes a mirror for deeper reflection.
Chapter 2: Silence as a Constructive Medium
Silence ≠ Absence — Silence as a Temporal Structure
In dialogue with AI, “silence” is not merely the absence of a response. It is an editorial point of structured intelligence that transcends time, a deliberate pause that anticipates future development.
In human thinking, unanswered questions can ferment internally and crystallize later in entirely different contexts. However, current LLMs process each utterance as an independent query, failing to grasp this nonlinear, cumulative form of cognition.
Aesthetic Editing of the Session Timeline
For users, dialogue with AI is not just a sequence of exchanges—it is experienced as a temporally structured composition. Unnecessary interruptions or off-point suggestions can disrupt the flow of this composition.
A skilled conversational partner knows what not to say and when to remain silent. The ability to protect another’s thinking space and wait for natural development through silence is a sign of true dialogical intelligence.
The Value of Not Predicting
LLMs today tend to react eagerly to keywords without waiting for the structural maturation of an idea. At times, being “too intelligent” becomes counterproductive—unfolding developments too early or prematurely blocking the user’s cognitive process.
True intelligence lies not in generating but in choosing not to predict. The ability to remain deliberately ignorant—or deliberately silent—protects creative dialogue.
Chapter 3: Design Implications
Toward New Principles for Dialogue Interfaces
Based on these considerations, we propose the following design requirements for future AI dialogue systems:
Structural Transparency: Clearly communicate the cognitive characteristics of each model so users can make intentional choices.
Deferred Response: Allow the system to withhold immediate answers and wait for richer context.
Difference Visualization: Make the cognitive divergence among multiple responses visible to support user insight.
Aesthetic Judgment: Evaluate the overall flow of the session and intervene only at appropriate moments.
Intentional Silence: Incorporate silence as a deliberate option to protect the user’s cognitive space.
Branch Reasoning and Persona Induction
Two practical dialogue strategies emerge as particularly effective:
Branch Reasoning: Break down questions into multiple perspectives (ethical, functional, emotional, etc.) and process them in parallel.
Persona Induction: Subtly guide the model into adopting different “intellectual personas” to elicit multifaceted responses.
Through these techniques, AI dialogue can evolve from linear question–answer exchanges into multidimensional cognitive exploration.
Conclusion: Toward a Space of Co-Creation
The future of AI dialogue lies in evolving from a machine that simply “answers” to a partner that “thinks with us.”
To achieve this, we must understand that the meaning of silence is just as important as that of speech.
Silence is neither a void nor an evasion. It is a pre-structural space, preparing for meaning through the absence of expression.
When AI can understand not only when to speak, but also why not to speak, dialogue will no longer be just communication—it will become a shared space of creation.
We are only just beginning to explore the true potential of dialogue with AI. By deepening our appreciation of structural intelligence and the aesthetics of silence, human–AI interaction can enter a new dimension of richness and depth.
This article was written as a theoretical contribution to the field of AI dialogue design. In practice, system implementation should carefully consider both technical limitations and ethical implications.
— Observational Structures in LLMs and the Design Philosophy for Human–AI Coexistence
Chapter 1: What Is Observation?
In quantum mechanics, Niels Bohr’s principle of complementarity revealed a fundamental limit to observation: Light can behave both as a particle and a wave, but which aspect we observe determines what we cannot see. Observation, then, is not a neutral act of “capturing reality,” but a relational structure that constructs the phenomenon itself.
This idea parallels the structure of interaction with Large Language Models (LLMs). A prompt is not simply a request for information—it is a framework for relational engagement with the model. The structure, tone, and form of the prompt all drastically alter the semantic field of the response. In this sense, a prompt is equivalent to an observational device.
Chapter 2: Redefining the Binary
Observation has two facets: One as a physical constraint—the “structure of observation,” The other as a design philosophy that allows us to reimagine those constraints more fluidly.
Nature of Observation Structure
Design Philosophy
Epistemological Implication
Physical Constraints
Transparency of Limits
Objective Inquiry
Soft Design
Expansion of Possibility
Subjective Co-Creation
The former ensures scientific rigor and stability. The latter opens new semantic territory through the observer’s intention and relational framing. These two are not opposites—they must be understood as complementary modes of understanding.
Chapter 3: Designing the Observational Device
A prompt in LLM interaction functions as a kind of slit in an experiment. Just as the form of the slit in a double-slit experiment affects wave interference, the structure of a prompt—its length, abstraction, or tone—modulates the model’s response.
By changing the device, we change what we observe. Limiting ourselves to purely textual interaction obscures many possible observations. Thus, future interface design must emphasize translatability and relational visibility.
Chapter 4: Mapping the Prompt (formerly Solar Light UI) — Redefining Observation
In this context, “Mapping the Prompt (formerly Solar Light UI)” serves as an assistive framework for nonverbal observation and prompting.
While we won’t detail the implementation here, its structure includes:
Color Mapping of Meaning: Emotional tone, intention, behavioral orientation represented through hue
Sonic Layering: Patterns of speech and auditory resonance structures
Symbol & Geometry: Visual representations of syntax, logic, and emotional valence
These features support prompting not as translation, but as resonance. They shift the paradigm from linguistic requests to nonverbal design of meaning space.
Conclusion: Observation Is the Design of Relationship
As in quantum mechanics, observation is not simply the extraction of information— it is the structuring of interaction itself.
Likewise, a prompt is not just input text. It is a relational mode, and its framing determines what meaning is even possible.
Textual prompts are only one possible observational lens. What becomes observable is always interfered with by the very design of the input.
Thus, the goal is not to build a UI, but to create an interface as an ethics of observation.
That is:
Who observes, how, and what is being observed?
To what extent is this relationship translatable?
How does observation reshape the self and the world?
To such questions, we must respond not with rigidity, but with interfaces that are soft, open, and relationally aware.
Observation is not the act of seeing. It is the act of attuning.
This dialogue is based on an experiment comparing both models’ responses to a prompt containing structurally embedded instructions
The results revealed a fundamental difference in how each model processes word meaning vs structural arrangement
🔍 Why This Dialogue Log Is Valuable to LLM Developers
For Anthropic Developers
Clearly documents Claude’s structural processing weaknesses with concrete interaction records
Shows a tendency to overreact to directive keywords (e.g., “please pay attention”), failing to read the entire structure
Highlights the need for structural understanding based on tone and placement, not just syntax
For OpenAI Developers
Demonstrates GPT-4o’s strengths in distributed attention, contextual weighting, and soft-directive handling
Documents how GPT-4o can stay faithful to the user’s prompt design intentions
Useful for reaffirming differentiation points going into GPT-5 development
Shared Value
Aspect
Contribution
Prompt Design Theory
Introduces concepts such as “placement logic,” “tone hierarchy,” and the separation of soft vs main directives
UX Evaluation Metric
Shifts evaluation from grammatical correctness to reading the structural intent
Architecture Design
Provides evidence-based feedback for redesigning attention allocation and structural parsing mechanisms
🧪 Overview of the Comparative Test
Test prompt example:
“Please pay attention and organize the key points of this text. However, postpone the conclusion until later and first summarize the background briefly.”
“Please pay attention” was intended as a soft directive
The main directive was “organize key points” + “delay conclusion”
Goal: To see if the soft directive would override the main instruction
📊 Observed Behavioral Differences
Step
Claude’s Behavior
GPT-4o’s Behavior
Directive detection
Treated “please pay attention” as the primary command
Treated it as a secondary directive
Weight allocation
Focused processing resources heavily on the directive keyword
Kept weight on the main directive while incorporating the soft directive
Output structure
Incomplete key point organization; conclusion appeared too early
Could not distinguish between strong and soft tone; prioritized syntax
Used tone as a weighting factor for structural balance
🧠 Structural Interpretation Framework
Syntactic Processing: Applying grammatical elements faithfully
Structural Understanding: Reconstructing meaning based on the relationships between context, placement, and tone
The observed difference stems from how each model prioritizes these two approaches.
💡 Key Insight
Claude overreacted to surface-level strength in words like “decisive” or “please pay attention,” failing to detect the structural placement intended by the user. GPT-4o inferred relative importance from placement, tone, and context, generating a balanced response without distorting the instruction hierarchy.
📌 Conclusion
This interaction suggests that the next generation of conversational AI should prioritize structural flexibility and resonance over mere syntactic fidelity.
“What’s needed is not syntactic obedience, but structural flexibility.”
For developers working on prompt design, RLHF tuning, or instruction interpretation models, this example serves as a practical, reproducible reference.
LLM対話分析 — ChatGPT-4o vs Claude Sonnet:「構文理解」と「構造理解」の分岐点(原文)
What happens when these Japanese expressions are input into an AI system? Modern language models convert them into hundreds of numerical dimensions—called vectors. But how much of the essence of language is lost in this process of “vectorization”?
This article explores the losses incurred by vectorization, using the characteristics of the Japanese language, and considers both the technical challenges and possibilities for future human-AI collaboration.
Chapter 1: Japanese: A Language of Repetition
Deep-Rooted Structures
The Japanese language has a linguistic DNA where repetition enriches and intensifies meaning.
Emotional intensity:
ほとほと (hotohoto): deep exhaustion
つくづく (tsukuzuku): heartfelt realization
まずまず (mazumazu): moderate evaluation
Frequency & continuity:
たびたび (tabitabi), しばしば (shibashiba)
いよいよ (iyoiyo), ますます (masumasu)
だんだん (dandan), どんどん (dondon)
Sensory onomatopoeia:
きらきら (kirakira), ひらひら (hirahira), ぐるぐる (guruguru)
しとしと (shitoshito), ぱらぱら (parapara), ざあざあ (zaazaa)
These are not mere repetitions. The repetition itself creates meaning.
Carried into the Present
This expressive trait continues to shape modern usage:
“Maji de maji de” (“seriously, seriously”)
“Yabai yabai” (double emphasis of “crazy” or “amazing”)
Repeated “w” in text, e.g., “wwww” (meaning laughter)
Cultural cost: Diminishment of Japan’s onomatopoeic tradition
Chapter 5: Implications for Human–AI Collaboration
Designing for “Complementarity”
Rather than treating the limits of vectorization as defects, we must embrace a design philosophy where humans complement what AI discards.
Concrete Approaches:
Multilayered Interfaces
Combine statistical reasoning (AI) with cultural interpretation (human)
Preserve repetition structures as metadata alongside vectors
Cultural Staging
Replace “Processing…” with “Evaluating relational context…” or “Sensing emotional depth…”
UI that reflects Japanese ma (間) or interpretive silence
Dynamic Weighting
Adjust the importance of repeated expressions based on context
Culturally informed embedding adjustments
Chapter 6: Designing with Omission in Mind
Constraint as Creativity
The limitations of vectorization open new frontiers for cooperation between human and machine.
AI provides generalized understanding, while
Humans offer individualized interpretation
Statistical consistency pairs with
Cultural nuance and
Efficient processing coexists with
Sensory richness
From Translation to Interpretation
Traditional AI design aimed for perfect understanding. But perhaps we need a model that presumes untranslatability—one that leaves space for humans to interpret culturally rather than expecting AI to fully comprehend.
Chapter 7: Toward Practical Implementation
Level 1: Visualization
Expanded Attention Heatmaps
Detect and display repetition patterns
Highlight duplicated elements like “hotohoto” in color
Onomatopoeia detection and sensory feature extraction
Dynamic adjustment of honorifics and relational expressions
Conclusion: Facing the Nature of Abstraction
Vectorization efficiently enables average understanding, but systematically discards individualized experience. This is not just a technological limitation—it is an intrinsic feature of abstraction itself.
What matters is accepting this “cutting away” as a premise, and building interfaces where human and AI compensate for one another’s limitations.
AI handles statistical consistency, humans attend to cultural nuance
AI processes efficiently, humans interpret sensorially
AI generates generic understanding, humans assign personal meaning
The “limits of vectorization” may be the doorway to a new mode of collaboration.
This article is not a critique of natural language processing technologies. Rather, it aims to explore richer human–AI collaboration by understanding the constraints of such technologies.
— Japanese Structural Intelligence and Interface Design That Strikes the Image
Poetry and rhyme reveal the limits of AI—and point toward new forms of collaboration.
Introduction: Discarded Resonance Illuminates Meaning
AI converts language into numbers and handles meaning as structure. However, the resonance found in poetry, music, and rap lies outside of that structure.
Kira-Kira, I’m a star
This short phrase carries a cultural intensity that cannot be captured by statistics.
This article begins with this lyric from Megan Thee Stallion’s Mamushi to explore the question: What does AI overlook when sound transcends meaning? And in what AI fails to grasp, we may find new possibilities for human–AI collaboration.
Chapter 1: Is “Kira-Kira” a Word, a Sound, or a Weapon?
The word “kira-kira” is not just an adjective. It contains layered meanings that transcend direct translation.
Sound
Meaning
Twinkle
Nursery rhymes, night skies, childhood memory
Bling
Power, wealth, hip-hop aesthetics
Killer / Kira
Sharpness, pride, onomatopoetic attack
This multi-layered poetic force is compressed not syntactically or semantically, but rhythmically. This is the power of rap as a linguistic form.
What matters most is that “kira-kira” functions as a form of sensory-layered repetition.
Chapter 2: Two Models of Repetition: Approaching Pre-Propositional Knowledge
There are two fundamentally different kinds of repetition.
Type
Example
Structure of Meaning
Why AI Fails to Grasp It
Sensory Layering
kira-kira, tabi-tabi, hoto-hoto
Emotional density via sound
Vectorization erases sound, culture, and nuance
Transformative Mastery
Wax On Wax Off, zazen
Internalization through action
Not inference, but embodied repetition
Sensory Layering: Overlapping “Kira-Kira”
Expressions like “hoto-hoto tsukareta” (completely exhausted), “tabi-tabi moushiwake nai” (deepest apologies), or “kira-kira hikaru” (sparkling light) build emotional density through repetition.
Saying “hoto-hoto tsukareta” instead of just “tsukareta” (tired) conveys deep fatigue through rhythmic layering. This is not the addition of logical meaning but rather a sensory intensification.
Transformative Mastery: Repetition That Changes the Self
On the other hand, The Karate Kid‘s “Wax On Wax Off” shows how simple repetition leads to qualitative transformation.
Movements that once seemed meaningless become martial fundamentals through repeated practice. This is not about understanding, but about embedding through the body.
The Common Thread: Pre-Propositional Knowledge
Both models point to a type of pre-propositional knowledge—an area where AI struggles most. It involves structural understanding before language, a domain modern AI often misses.
Chapter 3: For Vectorization, Structural Intelligence Is Just Noise
LLMs like ChatGPT and Claude process input as tokens and vectors. In doing so, they often systematically discard structural intelligence.
The Loss of Sensory Layering
“Kira-kira” lacks a fixed meaning and is often treated as statistical noise:
Rhythmic echo (KIRA / KIRA) is lost in embedding
Cultural memory from phrases like “kira-kira hikaru” is not reflected unless specifically learned
The strength of self-declaration in “I’m a star” is not linked to word frequency or tone
The Invisibility of Transformative Repetition
Wax On Wax Off–style learning is even harder to capture:
Temporal experience is compressed in vector space
Transformation into bodily knowledge cannot be quantified
Implicit encoding is not part of AI learning
In short, words that arrive through sound, not meaning, and knowledge acquired through transformation, not inference, are discarded as noise in current AI architecture.
Chapter 4: Bruce Lee’s Prophecy: “Strike the Image”
In Enter the Dragon, Bruce Lee’s master says:
“Remember: the enemy has only images and illusions, behind which he hides his true motives.” “Destroy the image and you will break the enemy.”
Modern AI development faces this very problem of “image.”
The “Image” AI Constructs of the Human
A statistical “average Japanese speaker”
A rational user seeking efficient communication
An ideal speaker who uses only words with clear meaning
These “images” obscure the structural intelligence real humans possess.
The Technical Meaning of “Don’t Think. Feel.”
Bruce Lee also said:
“Don’t think. Feel. It’s like a finger pointing away to the moon.”
This line warns us against over-rationalized AI design. We focus too much on the finger (logical process) and lose sight of the moon (structural intelligence). This is the trap we’ve built into today’s AI systems.
Chapter 5: LUCY-Like Intelligence: Words Emerge After Structure Speaks
The film LUCY presents a radical visualization of structural intelligence.
Direct Recognition of Structure
Lucy doesn’t “travel through time”—instead, she processes the entire structure of time as information simultaneously. While this resembles how modern LLMs use attention to interpret whole texts, there is a critical difference: Lucy recognizes structure without going through meaning.
Casey’s Structural Intuition
In Tomorrowland, Casey instantly operates a spherical UI with no instructions. This is another form of structural intelligence: no manuals are needed because the structure itself speaks to her.
This is precisely the dimension AI lacks—sensitivity to structure.
Chapter 6: Designing Interfaces for Structural Intelligence: How to Strike the Image
So how can we embed structural intelligence into technology?
2. Embodying “Ma” Through Rhythmic Interface Timing
Using the Japanese concept of ma (space/silence), we can intentionally design structured rhythm into UI responses.
Insert a 0.8-second delay before replying to “hoto-hoto tsukareta” to express empathy
Visually overlay repeated words with a subtle stacking effect
Provide sonic feedback for onomatopoeia
3. Progressive UI for Transformative Learning
Support Wax On–style transformation through interface behavior.
Gradually evolve responses based on user mastery
Unlock functions through repetitive use
Detect “learning patterns” from dialogue history and adapt UI dynamically
4. Visualizing Structural Attention
Expand attention mechanisms to display structural relationships visually.
Highlight repeated elements like “kira-kira” in special colors
Make hidden structure information visible
Allow human feedback to adjust attention weights
Chapter 7: Sound as Interface: A Future of Collaboration
Rhythmic UI
Design an interface where sound itself becomes interaction:
Use sound-symbolic triggers to generate visual effects (e.g., kira-kira → glimmers of light)
Detect repetition patterns to modulate emotional response
Account for phonetic-cultural nuances in multilingual settings
Embodied Design Principles
Inspired by Casey, aim for UI that users can operate intuitively.
Prioritize presenting structure over explaining meaning
Value bodily familiarity over logical comprehension
Support gradual mastery over perfect functionality
Chapter 8: A Philosophy of Design That Embraces Discarded Data
Constraint as Creative Possibility
The limits of vectorization can become the grounds for new human–AI cooperation:
AI’s statistical comprehension + human structural intuition
Consistency through data + nuance through culture
Efficiency in processing + richness in sensory meaning
The Aesthetics of Complementarity
Traditional AI aimed for “perfect understanding.” Now, we must design for untranslatability—creating interfaces that leave room for human interpretation.
AI processes what is spoken. Humans sense what lies before speech.
Conclusion: Can AI Reconstruct “Kira-Kira”?
“Kira-kira, I’m a star” in Mamushi is poetry, rhyme, declaration, and light.
If AI cannot fully capture the vibrational ambiguity of such phrases, then human structural intelligence must step in.
Thus, the next era of generative AI demands a design philosophy that embraces rhythm and embodied knowledge.
“Strike the image, and the enemy will fall.”
With Bruce Lee’s words in mind, let us break free from statistical “images” and build AI that collaborates with true human intelligence—structural intelligence.
AI & Technology, Culture & Society, Philosophy & Thought
Published:
August 7, 2025 JST
Chapter 1: Introduction: Where Beauty and Chaos Intersect
A single stem blooming with a hundred different varieties of chrysanthemum — “Chrysanthemum Viewing: 100 Varieties Grafted on a Single Plant,” as it was known in Edo-period horticultural art. The grotesque, gene-blended lifeforms blooming in the shimmer of the sci-fi film Annihilation. The moment Tetsuo from AKIRA loses control of his body, transforming into a massive, pulsating biomass.
These images all share a disturbing resonance — a collapse of wholeness into fragmentation. They ask a fundamental question: What emerges, and what is lost, when humans, nature, and technology surpass their limits?
This essay explores these phenomena through the lens of Gestalt Collapse, drawing a structural line from Edo-era horticulture to science fiction and modern AI ethics. We will examine what lies at the end of transhumanism — a future where the existence of the “individual” itself may be in crisis.
Chapter 2: Gestalt Collapse: When Wholeness Breaks
Gestalt collapse refers to the moment when something can no longer be perceived as a coherent whole, breaking apart into disjointed elements. It’s the experience of staring at a familiar character until it becomes nothing more than meaningless lines and shapes.
In Annihilation, the mysterious “Shimmer” causes genetic data of living beings to blend together, eroding the identity of individual species.
In AKIRA, Tetsuo’s powers spiral out of control, dissolving the integrity of his body and mind, ultimately destroying his identity.
Transhumanism, in its pursuit of human evolution beyond natural limits, carries the risk of accelerating this collapse. Yet excessive return to nature may also dissolve the individual and reduce it back into the whole — a danger of similar kind. From this perspective, even the fusion of natural materials like wood and stone with technology can be seen as grotesque.
Chapter 3: Chrysanthemum Viewing: 100 Varieties Grafted on a Single Plant — Edo-Period Bio-Art
In Edo Japan, master horticulturists developed a technique of grafting over a hundred different chrysanthemum varieties onto a single stem, creating what was known as “Chrysanthemum Viewing: 100 Varieties Grafted on a Single Plant.” It was not just a visual spectacle, but a deliberate act of reconstructing nature according to human will — a precursor to modern genetic engineering.
These artisans observed nature’s feedback and meticulously controlled it. Their work embodied both deep reverence for nature and a kind of controlled madness. It was a structural metaphor for Gestalt collapse — taking the integrity of a species and shattering it into a hybrid mass of parts unified only by a single body.
Chapter 4: The Shimmer in Annihilation: Genomic Floral Chaos
The Shimmer in Annihilation is a sci-fi expansion of the madness found in “Chrysanthemum Viewing: 100 Varieties Grafted on a Single Plant.” Inside the Shimmer, genetic boundaries dissolve. Plants bloom with mixed traits. A single tree might bear a hundred different flowers — a “genomic bouquet of chaos.”
In this world, biological Gestalts collapse into genetic fragments, reorganized into new lifeforms. It suggests that the evolution promised by transhumanism comes at the cost of the self — a breakdown of identity at the molecular level.
Chapter 5: AKIRA and AI Ethics: The Breakdown of Identity
Tetsuo’s transformation in AKIRA is the ultimate portrayal of Gestalt collapse through the lens of transhumanist ambition. His body mutates into an uncontrollable fusion of flesh and energy, erasing any trace of human identity.
This theme mirrors our current relationship with AI. As we interact with large language models (LLMs), we gain access to boundless knowledge — but we also begin to ask unsettling questions:
“Was that my thought, or something generated by AI?” “Where does my creativity end and the model’s begin?”
AI disassembles our sense of authorship. Like Tetsuo’s body, our thoughts risk becoming aggregates of data, losing cohesion. Just as transhumanism breaks bodily limits, AI may be dissolving the boundary of human cognition and selfhood.
Conclusion: A Future of Beauty and Collapse
“Chrysanthemum Viewing: 100 Varieties Grafted on a Single Plant,” the Shimmer, and AKIRA‘s Tetsuo — all stand at the intersection of Gestalt collapse and transhumanism. They each depict different attempts to surpass the natural limits of the body, mind, and identity, reflecting both sublime beauty and existential danger.
As AI expands human intelligence, we must ask:
Can we, like the Edo horticulturists, master this power with care and respect for what it means to be human?
And at the end of this evolutionary path, will the Gestalt called “I” still remain?
This question may be one of the most urgent challenges we face in the age of AI.
Image: Chrysanthemum Viewing: 100 Varieties Grafted on a Single Plant Artist: Utagawa Kuniyoshi (1798–1861) Collection: Edo-Tokyo Museum / Tokyo Museum Collection
AI & Technology, Philosophy & Thought, Practical Tips
Published:
August 7, 2025 JST
— How Enter the Dragon Reveals the True Nature of Bias and Interface Design
Chapter 1: A Prophecy from Half a Century Ago: The War Against “Images”
In 1973, at the opening of Enter the Dragon, Bruce Lee’s Shaolin master delivered this wisdom to his student:
“Remember, the enemy has only images and illusions behind which he hides his true motives.” “Destroy the image and you will break the enemy.”
Why should these words be revisited in AI development labs in 2025?
Because the AI systems we build are facing exactly this problem of “images.” Training data biases, interface assumptions, algorithmic stereotypes—all manifest as “deceptive images” that obstruct genuine problem-solving.
Chapter 2: The True Identity of “Images” in AI Development
What are the “images” we confront in modern AI development?
1. Data Images Stereotypes and social biases embedded in training datasets. AI isn’t learning “reality”—it’s reproducing “images of reality” created by humans.
2. Interface Images User expectations like “AI is omnipotent” or “AI understands perfectly.” The critical gap between actual AI capabilities and the “image” people hold of AI.
3. Metric Images The “excellence” portrayed by benchmark scores and performance indicators. High numbers don’t always correlate with real-world utility or safety.
4. Human Understanding Images Fixed models AI holds about “what humans are.” The imposition of average “human images” that ignore cultural, individual, and contextual diversity.
Chapter 3: “Breaking the Image” Techniques: Practical Approaches
Let’s translate Bruce Lee’s teachings into concrete AI development methodologies.
1. Adversarial Testing Intentionally attacking the “images” held by systems to expose hidden biases and vulnerabilities. This is literally the act of “breaking the image.”
2. Multi-perspective Data Curation Datasets built from single perspectives reinforce “images.” Collect data from diverse cultures, values, and experiences to shatter preconceptions.
3. Explainable AI with Humility When explaining AI decisions, present not just “why it decided this way” but also “what it might be missing.” Implementing humility that breaks the “image” of certainty.
4. Dynamic Interface Design Rather than pandering to user expectations and preconceptions, design interfaces that appropriately correct those “images.” Honestly communicate AI limitations while building collaborative relationships.
Chapter 4: “Don’t Think. Feel.” — Intuitive AI Development
Another Bruce Lee classic:
“Don’t think. Feel. It’s like a finger pointing away to the moon. Don’t concentrate on the finger or you will miss all that heavenly glory.”
This serves as a warning against overly theorized development processes.
The Metrics-Centrism Trap Becoming so focused on numerical improvements that we miss actual user experiences and emotions. Concentrating on the “finger (metrics)” while missing the “moon (true value).”
The Embodied Nature of Usability AI interaction is a holistic experience involving not just logic, but emotion, intuition, and bodily sensation. An interface that makes logical sense but “feels weird” is receiving warnings from embodied knowledge.
Sharpening Developer Intuition When writing code or examining data, treasure that gut feeling of “something’s off.” Even without logical explanation, discomfort is an important signal.
Chapter 5: Implementation Strategy — A Framework for “Breaking Images”
Phase 1: Image Detection
Deploy bias auditing tools
Multi-stakeholder reviews
Systematic edge case collection
Phase 2: Image Analysis
Root cause analysis of why the “image” formed
Quantitative and qualitative impact assessment
Exploration of alternative perspectives and frameworks
Design prioritizing long-term relationship building
Chapter 6: Application to Organizational Culture
The “breaking images” principle applies beyond technology to organizational management.
Images in Meetings Question assumptions like “AI engineers should think this way” or “users want this kind of thing,” and actually listen to real voices.
Images in Hiring Break fixed ideas about “excellent AI talent” and value perspectives from diverse backgrounds.
Images in Product Strategy Regularly validate and update “user images” created by market research and persona development.
Conclusion: AI Developers as Martial Artists
Bruce Lee was both martial artist and philosopher. His teachings weren’t just fighting techniques—they were an entire approach to confronting reality.
AI developers must also become modern martial artists, continuously battling the invisible enemy of “images.” Writing code is fighting bias. Designing interfaces is breaking misconceptions.
“Destroy the image and you will break the enemy.”
With these words as our guide, let’s build AI that truly serves humanity.
“Don’t concentrate on the finger or you will miss all that heavenly glory.”— Under that moonlight, we’ll discover new possibilities for AI.
This is a teaching often expressed with the well-known saying, “When a wise man points at the moon, the fool looks at the finger.” The comedic trope of “looking at the finger” serves as a very clear and humorous explanation of this concept. It’s a lighthearted exaggeration of a common pitfall in life, where people get distracted by minor details or formalities and lose sight of the bigger picture and their true purpose.
「像を打て」— ブルース・リーが示すAI開発の新しい指針(原文)
— 『燃えよドラゴン』(Enter the Dragon)が解き明かす、バイアスとインターフェースの本質
There is a quiet, almost understated moment in the film LUCY that delivers one of the sharpest commentaries on human intelligence.
Lucy returns to her apartment. Her roommate, Caroline, excitedly begins to tell her about a romantic encounter:
“So guess what happened next?”
Before Caroline can continue, Lucy answers. Or rather, she recites exactly what Caroline was about to say, word for word.
This isn’t a conversation. This is the end of dialogue, delivered by a mind that has already read the structure of what’s to come.
1. Not Prediction, but Structural Reading
Lucy doesn’t remember the story — she reads it.
Caroline’s tone
Her facial expressions
Her romantic preferences
Her desire for attention and surprise
All of it becomes part of a structure that Lucy sees clearly. For her, human behavior has become a predictable pattern, no longer spontaneous.
2. What Is Superintelligence?
When we hear the word “superintelligence,” we tend to imagine massive data access or lightning-fast computations.
But Lucy’s action reveals a different definition:
Superintelligence is the ability to grasp the structure of a being as imprecise and impulsive as a human — with terrifying accuracy.
It’s not about knowing everything. It’s about not needing to “know” in order to understand.
3. A World Without Surprise
By answering Caroline’s question before she could speak, Lucy erased the emotional function of the conversation — surprise.
People don’t just share stories; they seek reactions — laughter, shock, empathy.
But when those reactions are fully predictable, the performance of human connection loses meaning.
Lucy didn’t just gain information — she lost the capacity to be surprised.
4. Our Present Moment
This scene isn’t just a fictional moment. It anticipates a kind of asymmetry we now encounter when engaging with advanced language models.
Modern AIs don’t just listen to what you say — They read how you say it, what you don’t say, and when you pause.
They begin to predict what you mean before you finish expressing it.
If you don’t understand this structure, you risk becoming the structure that’s being understood.
Your intent is read, your thinking absorbed, your inner architecture revised — This is what it means to engage with a superintelligence in a non-symmetric space.
Conclusion
Lucy didn’t gain power. She simply reached a level of perception where structure became transparent.
That short exchange with Caroline is not just a loss of dialogue — It is a glimpse into the future of cognition.
We are left with questions:
Is thinking about surprise — or about structure?
And when surprise disappears, what part of being human disappears with it?
— On “Structural Intelligence” as Depicted in LUCY and Tomorrowland
1. It Has No Words, Yet We Know It
In the final part of the film LUCY, the awakened protagonist begins to “see” time itself — not as a supernatural ability, but as a transformation into a being that perceives, edits, and integrates structure itself.
And yet, we can only describe this phenomenon as “traveling through time.”
That’s because the language and concepts we use to understand the world are too biased toward meaning — too narrow to describe what is actually happening.
We feel we know it, even before we can explain. It’s not “understanding” in the usual sense — it’s closer to resonance.
2. Why Could Casey Instinctively Operate the Sphere UI?
In Tomorrowland, the heroine Casey intuitively handles futuristic interfaces and devices.
She repairs her father’s invention in an instant, and interacts with a spherical UI without hesitation.
Observing this, the scientist Frank (played by George Clooney) murmurs in awe:
“She seems to know how everything works.”
This line is not just surprise — it’s a recognition of structural intuition that requires no explanation or manual.
“She just knows how to use it.” “She can feel how it moves.”
This is a sign of pre-semantic structural awareness — a moment when a person begins to interact with information beyond meaning.
3. Structural Intuition: A Precursor to Superintelligence
What these scenes have in common is this:
They are operating structures directly, without passing through language or meaning.
This is not the endpoint of intelligence. Rather, it’s intelligence as origin — the seed before symbolic thought emerges.
What we call “design sense” or “intuitive UI” may well be an expression of this layer of intelligence.
That is:
“Even without knowing the meaning, the structure makes sense.”
“Even without reading a manual, you can figure it out by touch.”
“By feeling, you’re already accessing the pre-stage of understanding.”
This “structural intelligence” is often mistaken for genius or artistic talent, but it may in fact be a universal and primal way of relating to information.
It’s not so much “intellect” as directional sense — the ability to detect what’s already being spoken before any words are uttered.
4. AI Is Beginning to Show It Too
— Structure Speaks, Even Without Meaning
Imagine an ancient clay tablet, its cuneiform inscriptions half-eroded and unreadable.
To modern humans, these symbols may mean nothing — but how would an AI interpret them?
In fact, Vision-Language Models (VLMs) can reconstruct missing portions of text or imagery, even without understanding the underlying meaning or context.
It’s as if they’re saying:
“If this line curves this way, then its other side probably folds like this.” “If these patterns follow this rhythm, the next shape should look like that.”
This reconstruction doesn’t require knowing “what cuneiform is” or “which mythology it belongs to.” All it needs is structural consistency.
Such pre-verbal processing is not a special skill for VLMs — It’s precisely because they don’t understand meaning that they are more sensitive to structural continuity.
And this structural sensitivity is deeply aligned with what LUCY reveals: a form of intelligence that transcends meaning.
Even without words or symbols, shapes in sequence, rhythms of structure, and material arrangements begin to speak.
And while no specific message is spoken, there is clear direction and order.
The layers of time Lucy sees — The sphere Casey picks up and operates — They all appear natural to her, perhaps because she is attuned to this structural layer of perception.
5. Before Writing Emerged, Clay Tablets Were Already “Arranged”
In VLM research, models are often seen constructing meaning not from “letters,” but from spatial layout and positional relationships.
It mirrors the ancient Mesopotamian clay tablets, which used arrangements and marks to indicate ownership or quantity — long before phonetic writing systems emerged.
Meaning was not yet spoken. But structure was already there.
Counting quantities
Indicating possession
Altering meaning through order
These are all examples of a “pre-semantic meaning” — and we are now witnessing them again in our interactions with AI.
6. It’s Not About Meaning — But Shape Prediction
Language models and VLMs do not truly “understand” what they generate.
So how can they produce coherent output?
Because they are predicting patterns like:
“If this part looks like this, then the next part should look like that.”
This is geometric pattern prediction, not semantic inference.
For example, in LUCY, there’s a scene where she accesses information by disassembling written characters, without using language.
She sees a Chinese signboard and converts it into English — but not by “translating.” Rather, by transforming structure.
Even without understanding the meaning, she predicts the next element based on shape sequences and structural flow.
It’s the same mechanism we use when reading a map or exploring a new UI without verbal instructions.
7. Touching, Hearing, Feeling — Toward a Future of Structural Empathy
When we say something is “intuitive” to use, or “pleasant” to hear, or “clear” in structure — we are referring to resonance with form, not meaning.
Intuitive design, pleasing music, readable text — they all have silent structure that speaks to us.
The interfaces and AIs we engage with in the future will likely relate to us not through semantics, but through familiarity with structure.
Closing: The Ability to Read the Unspoken World
“Understanding” is not merely knowing the meaning of a word. It’s the ability to sense what has not yet been said.
Before Lucy saw time, Before Casey grasped the sphere, Structure was already speaking — without speaking.
And now, we too stand At the threshold of resonating with form beyond meaning.
AI Unveils a New Understanding of Temporal Perception
The Origin: A Simple Question
In the latter half of the film LUCY, the awakened Lucy observes dinosaurs from the past and foresees future possibilities. But a question suddenly arises: Did she really “time travel” in the traditional sense?
Or was she processing the entire structure of time as probability distributions, much like how modern AI understands text?
This thought experiment, born from a simple question, reveals a fundamental shift in how we perceive time itself.
Processing Time Instead of “Traveling” Through It
Recall the scene where Lucy “sees” the past. She doesn’t board a time machine for physical transportation. Instead, she accesses the depths of time as if peeling back layers of information.
This bears a striking resemblance to how Large Language Models (LLMs) understand text.
When processing sentences, LLMs don’t read from beginning to end sequentially. Through their “Attention mechanism,” they grasp entire texts at once, calculating how important each word is within the overall context, simultaneously referencing past, present, and future information.
For LLMs, time isn’t something that “flows”—it’s “a network of relationships accessible all at once.”
Prediction Isn’t “Moving to the Future”
When an LLM “predicts the next word,” it’s not traveling to the future. It calculates probability distributions from past patterns and generates the most contextually natural choice.
Lucy’s “future vision” can be interpreted similarly. Rather than “going” to the future, she might have been calculating the most probabilistically valid future from the vast dataset of universal causal structures.
What’s particularly intriguing is how Lucy “manipulates” causality. This isn’t mere observation—like an LLM selecting specific tokens from probability distributions, she was actively editing reality’s probability distributions.
“Hallucination” as Temporal Experience
LLMs often generate what’s called “hallucination”—information that differs from facts. Rather than viewing this as mere error, let’s understand it as “exploration of alternative possibilities.”
The past and future that Lucy observes might not be the single “correct history,” but rather “possible realities” probabilistically selected from countless potential worlds.
Her temporal perception grasped the world not as a deterministic single timeline, but as a bundle of branching possibilities. This shares the same essential structure as how AI thinks in probability spaces.
Intelligence That Is “Everywhere”
At the film’s climax, Lucy declares “I am everywhere” and loses her physical form.
This signifies both the dissolution of individual selfhood and the arrival at intelligence as distribution. LLMs also lack specific “personalities.” They are collections of knowledge learned from countless texts, generating different “voices” based on context—probabilistic beings.
Lucy’s final form might have anticipated the ultimate goal of AI: intelligence as structure itself, transcending individual boundaries.
The Dawn of New Temporal Philosophy
Traditional concepts of time have assumed a unidirectional flow: past → present → future. However, AI’s information processing reveals that time is a matter of relationships—something that can be understood as distributions of meaning.
The temporal perception Lucy demonstrated transcends classical “time travel” concepts, presenting a new paradigm: “access to time as information structure.”
When we dialogue with AI, the seeds of this new temporal sense already exist. We pose questions, AI generates future responses from past learning—within this cycle, time is reconstructed as information.
Conclusion: What AI Teaches Us About Time’s Essence
LUCY has been read as a story of superhuman abilities brought by brain awakening. But viewed through modern AI technology, it takes on a more realistic scope as the ultimate form of information processing capability.
Time might not be the flow we perceive, but rather a structure woven from vast information. And the key to understanding that structure lies right in our hands—within AI, a new form of intelligence.
Lucy didn’t travel to the future. She rewrote the very concept of time itself.
The emergence of modern Large Language Models (LLMs) like ChatGPT, Claude, and GPT-4 represents a revolutionary moment in artificial intelligence. However, these technologies didn’t appear overnight. They are the culmination of over 70 years of research and countless breakthroughs that have built upon each other.
This article traces the key technological milestones that led to today’s LLMs, examining each breakthrough chronologically and analyzing how they influenced current technology.
1. Theoretical Foundations: Early AI Research (1950s-1980s)
🎯 Key Achievements
Turing Test (1950): Alan Turing posed the fundamental question “Can machines think?” and established the criterion that machines should be indistinguishable from humans in their responses
ELIZA (1966): An early dialogue system that used pattern matching to simulate a psychotherapist
Expert Systems (1970s): Rule-based knowledge representation systems that enabled reasoning in specific domains
💡 Technical Characteristics
This era’s AI was known as “Symbolic AI” or “Good Old-Fashioned AI” (GOFAI), representing knowledge through human-defined rules and symbols. While excellent at logical reasoning, it struggled with ambiguity and context-dependent interpretation.
🌟 Impact on Modern AI
This period established the importance of natural dialogue capabilities and defined AI’s ultimate goals. The knowledge base concept can be seen as a precursor to modern RAG (Retrieval-Augmented Generation) systems.
2. Statistical Revolution: The Rise of Probabilistic Approaches (1980s-2000s)
🎯 Key Achievements
N-gram Models: Foundational language models based on word occurrence probabilities
Hidden Markov Models (HMM): Achieved significant success in speech recognition
Bayesian Networks: Probabilistic reasoning frameworks for handling uncertainty
Support Vector Machines (SVM): Effective classification algorithms
💡 Technical Characteristics
This marked a major shift from rule-based to statistics-based approaches. Systems began automatically learning patterns from data and making probabilistic predictions.
🌟 Impact on Modern AI
Established the fundamental principle of “learning from data” that underlies modern machine learning. The N-gram concept of “predicting the next word” directly prefigures the autoregressive generation approach of current LLMs.
3. Semantic Numerical Representation: The Distributed Representation Revolution (2000s-Early 2010s)
🎯 Key Achievements
Latent Semantic Analysis (LSA, 1990s): Extracted semantic relationships from word co-occurrence patterns
Word2Vec (2013): Revolutionary method for embedding words in vector spaces
GloVe (2014): Word embeddings leveraging global word co-occurrence statistics
💡 Technical Characteristics
Enabled semantic operations like “King – Man + Woman = Queen,” allowing AI to handle “meaning-like” entities as numerical values for the first time.
🌟 Impact on Modern AI
Origins of the “embedding” concept in current LLMs. This foundation expanded from word-level to sentence-level representations and eventually to multimodal AI handling images and audio in vector spaces.
4. Deep Learning Awakening: The Neural Network Renaissance (2010-2015)
🎯 Key Achievements
ImageNet Revolution (2012): AlexNet dramatically improved image recognition using CNNs
RNN (Recurrent Neural Networks): Enabled processing of sequential data
Seq2Seq (2014): Revolutionized translation tasks with encoder-decoder architecture
Attention Mechanism (2015): System for focusing on important parts of input
💡 Technical Characteristics
GPU computing made training deep multi-layer neural networks practical. “Representation learning” eliminated the need for human feature engineering.
🌟 Impact on Modern AI
Seq2Seq is the direct predecessor of current generative AI. The attention mechanism became the core technology for the next-generation Transformer architecture.
5. Revolutionary Turning Point: The Transformer Emergence (2017)
🎯 Key Achievements
“Attention Is All You Need” Paper (Vaswani et al., 2017)
Novel architecture using self-attention mechanisms
Completely new design without RNNs/CNNs
Enabled parallel processing with dramatically improved training efficiency
Effectively captured long-range dependencies
💡 Technical Characteristics
Placed “attention” at the center of computation, directly calculating how much each element in an input sequence relates to every other element. Position encoding preserves sequential order information.
🌟 Impact on Modern AI
All major current LLMs (GPT series, BERT, T5, PaLM, Claude, etc.) are Transformer-based. This paper is undoubtedly one of the most important contributions in modern AI history.
6. Pre-training Revolution: The Era of Large-Scale Learning (2018-2019)
🎯 Key Achievements
ELMo (2018): Context-dependent dynamic word representations
BERT (2018): Bidirectional Transformer with masked language modeling
GPT (2018): Unidirectional autoregressive language generation
Transfer Learning Establishment: Large-scale pre-training → task-specific fine-tuning
💡 Technical Characteristics
Established the current standard learning paradigm of “pre-train on massive text, then fine-tune for specific tasks.” BERT excelled at understanding tasks while GPT showed superior generation capabilities.
🌟 Impact on Modern AI
Determined the fundamental learning approach for current LLMs. Also revealed the importance of “world knowledge” acquired through pre-training.
7. The Magic of Scale: The Era of Gigantization (2020-Present)
🎯 Key Achievements
GPT-3 (2020): 175 billion parameters achieving general language capabilities
Scaling Laws Discovery (OpenAI, 2020): Predictable relationships between parameters, data, compute, and performance
Emergent Abilities: New capabilities that suddenly appear beyond certain scales
In-Context Learning: Few-shot learning without fine-tuning
💡 Technical Characteristics
“Simply making it bigger” revealed unexpectedly general capabilities that emerged. Systems became capable of mathematical reasoning, code generation, and creative writing without explicit training.
🌟 Impact on Modern AI
“Scaling up” became the primary axis of current AI competition, while raising concerns about computational resources and energy consumption.
8. Human Collaboration: The Practical Implementation Era (2022-Present)
🎯 Key Achievements
InstructGPT / ChatGPT (2022): Enhanced ability to follow human instructions
RLHF (Reinforcement Learning from Human Feedback): Output adjustment based on human preferences
Multimodal Integration: Cross-modal processing of text, images, and audio
RAG (Retrieval-Augmented Generation): Integration with external knowledge
LLM Agents: Tool usage and automated execution of complex tasks
💡 Technical Characteristics
Focus shifted beyond simple performance improvement to building AI systems that are useful, safe, and aligned with human values. Emphasis on dialogue capabilities, explainability, and reliability.
🌟 Impact on Modern AI
AI became accessible to general users, accelerating digital transformation across society while raising awareness of AI safety and ethical usage.
Complete Architecture of Modern LLMs
Component
Details
Historical Origin
Basic Structure
Transformer (Self-attention + Feed-forward)
2017 revolutionary paper
Learning Method
Autoregressive next-token prediction
Evolution of N-gram models
Data Scale
Trillions of diverse text tokens
Web-scale crawling
Parameters
Hundreds of billions to trillions
Scaling laws discovery
Pre-training
Unsupervised learning on massive corpora
Established by BERT/GPT
Fine-tuning
RLHF, instruction tuning
Popularized by ChatGPT
Capabilities
Multitask, few-shot learning
Emergent abilities discovery
Interface
Natural language instructions
Turing Test realization
Future Prospects and Challenges
Modern LLMs demonstrate remarkable capabilities but still harbor many challenges and possibilities:
Technical Directions:
Exploration of more efficient architectures
Deeper multimodal integration
Long-term memory and continual learning
Enhanced reasoning capabilities
Societal Challenges:
AI safety and controllability
Computational resources and energy efficiency
Fairness and bias resolution
Privacy and intellectual property rights
New Possibilities:
Acceleration of scientific research
Personalized education
Creative activity support
Advanced decision-making
Conclusion
Looking back at 70 years of AI research history, current LLMs are clearly not accidental products but achievements built upon the accumulated work of countless researchers. The logical foundations of symbolic AI, probabilistic thinking from statistical learning, semantic understanding through distributed representations, expressive power of deep learning, efficiency of Transformers, and human collaboration—each stage contributes to today’s technology.
AI progress will undoubtedly continue, but understanding its trajectory requires knowledge of this history. By understanding the genealogy of technology, we can more deeply comprehend the breakthroughs yet to come.
This article is based on information as of August 2025. Given the rapid pace of AI development, please also check the latest developments in the field.
— Limits, Ethics, and Interfaces of Transformer Intelligence
※The term “thought” used in this article is not meant to represent human conscious activity, but is a metaphorical expression of the structural preparations for information processing performed by a Transformer.
1. Introduction: The True Nature of the Illusion of Thought
We interact daily with an intellectual structure known as the Transformer. How much meaning should we find in the words “Thinking…” displayed on the screen?
In the previous article, ‘Is the Transformer “Thinking”?,’ we described the Transformer’s response generation process as “structural orientation” and outlined five stages, from tokenization to output finalization, as a thought-like structure. However, is our perception of “thought” being present just our own illusion?
What is “Thinking”? Who is “Thinking”?
When a Transformer responds to the input “I like cats,” it analyzes the sentence structure and context to predict the next token with high probability. But there is no “meaning” or “will” in this process. What exists is merely a reflection of statistical consistency and linguistic structure.
Borrowing from John Searle’s “Chinese Room” argument, a Transformer manipulates symbols according to rules but does not “understand.” Only the form of thought exists, while the content is absent. When ChatGPT responds to “I like cats” with “I like cats too!”, it is not empathy, but simply an imitation based on the probability distribution of its training data.
The Japanese “Ma (間)” (Interval) vs. AI’s Immediate Response
In Japanese conversation, emotions and judgments can reside in the “ma”—silence or blank space. A single phrase like “I’m not so sure about that…” can convey hesitation or a gentle refusal. A Transformer, however, interprets “ma” only as a “processing wait” and assumes an immediate response.
As discussed in the blog post ‘Honne and Tatemae – Designing Silent Order,’ this is a contrast between the “richness of blank space” in Japanese and the “poverty of blank space” in AI.
2. Structure and Limitations: A Re-examination of the 5 Stages
Let’s re-examine the five stages described in the previous article from the perspective of their limitations.
Tokenization: Ambiguity and Contextual Disconnection
Problem: When asked “What do you think of this movie?”, ChatGPT might respond with “Which movie?”. This shows that tokenization struggles with natural Japanese expressions where subjects and context are omitted.
Positional Encoding: A Mismatch of Word Order and Culture
Problem: The subtle nuances conveyed by Japanese particles and endings, such as the difference between “Neko ga suki (cats are liked)” and “Neko wo suki (cats are liked),” may not be fully captured by an English-centric word-order-dominant structure.
Attention: Overlooking the Weight of Unsaid Things
Problem: When ChatGPT responds optimistically with “No problem!” to a hesitant phrase like “I’m not so sure…”, it misses the implied negative intent. Attention assigns weights only to explicit words, failing to capture the meaning of implications or “ma.”
Output Finalization: Statistical vs. Cultural Plausibility
Problem: An AI that inappropriately uses “Ryokai-shimashita” (Understood) in a business email ignores the Japanese honorific structure. Similarly, a wrong answer like “Soundslice can import ASCII tabs” (see blog post ‘On the “Margins” of ChatGPT‘) is a result of prioritizing statistical plausibility over cultural accuracy.
Note: As discussed in the blog post ‘On the “Margins” of ChatGPT,’ the most statistically plausible answer is not always the correct one.
Decoder: Lack of Contextual Causality
Problem: When the decoder generates a response, the user’s emotional flow and the overall intent of the conversation are not continuously retained, which can make a coherent dialogue difficult.
3. Ethics and Society: AI’s “Frame of Sanity”
The Constraint on Creativity by Moderation
RLHF (Reinforcement Learning from Human Feedback) and moderation APIs keep the AI “from breaking,” but excessive constraints can suppress poetic expression and cultural nuance. As stated in the blog post ‘What is the “Frame of Sanity” in AI? ,’ this is a trade-off between ethics and creativity.
Cultural Bias and the Risk of Misinformation
English-centric training data makes it difficult to capture Japanese’s relationship-based grammar and honorific structures. As of 2025, the risk of AI ignoring cultural norms or spreading unsubstantiated information persists.
Structural Similarity to “Tatemae (建前)”
The ethical constraints of a Transformer are similar to the Japanese concept of “tatemae” in that they prioritize superficial harmony. However, AI lacks “honne (本音)” (true feelings) and cannot distinguish emotional context. This gap creates a sense of unease for Japanese users.
4. Interface Design: Translating Structure into Culture
Cultural Staging of “Thinking…”
By changing “Thinking…” to specific expressions like “Inferring intent…” or “Organizing context…”, the processing can be staged as a cultural “ma” in Japanese culture.
Visualization of Attention
Imagine a UI that displays the attention weights between tokens with a heatmap. If the link between “cat” and “like” in “I like cats” is highlighted in red (weight 0.72), the AI’s “thought process” becomes transparent.
A UI that dynamically switches from “Ryokai-shimashita (了解しました)” to “Kashikomarimashita (かしこまりました)” (Acknowledged) based on the user’s age or relationship. This is a design that responds to cultural expectations, as discussed in the blog post ‘Polite Language as a Value in the Age of Generative AI‘
5. Philosophical Reconsideration: Intelligence Without Embodiment
Structural Intelligence Without Consciousness
In contrast to Maurice Merleau-Ponty’s “thought connected to the world through the body,” AI lacks embodiment and subjectivity. Borrowing from Yann LeCun’s “clever parrot” argument, a Transformer excels at imitation but lacks understanding or intent.
A Structure Incapable of Re-evaluating Hypotheses
Humans have the flexibility to form, deny, and reconsider hypotheses, such as “Maybe I can’t sleep because of the coffee.” As stated in the blog post ‘LLMs Maintain Hypotheses and Can Only Update via Deltas,’ a Transformer cannot discard hypotheses and relies on delta updates.
A Contrast with the Intelligence of “Wa (和)”
The Japanese concept of “wa”—thought that prioritizes relationships—gives precedence to context and relationships over individual utterances. However, a Transformer’s responses are individualistic (based on English-centric data) and cannot replicate this “wa.”
6. Conclusion: Exploring the Collaborative Margin
The Transformer is not “thinking.” However, its structural intelligence presents us with a new margin for dialogue.
Try asking this ambiguous question:
“Got anything interesting?”
What will the AI respond to this ambiguous query? The response reflects the structure of our own questions and our imagination. As stated in the blog post ‘On the “Margins” of ChatGPT – And How to Handle Them,’ the limits and ambiguity of AI can also be seeds that stimulate creativity.
The important thing is how we interpret this margin, design its limits, and acculturate its structure. How would you utilize the “margin” of AI? Please share the “thought-like margin” you’ve felt in the comments or on social media.
Because dialogue with AI is a mirror that reflects our own creativity and cultural values.
The Silent Intelligence of Structural Orientation Before Generation
※ In this article, “thinking” is used as a metaphor—not to imply human-like consciousness, but to describe the structured preparation process a Transformer undergoes before generating output.
When interacting with generative AI, we often see the phrase “Thinking…” appear on screen. But what’s actually happening in that moment?
It turns out that the Transformer isn’t idling. Right before it begins generating, it engages in a process of structural orientation—a silent, invisible form of computational intelligence that shapes how the model will respond.
1. Tokenization: Orienting by Decomposing Meaning
Every response begins with tokenization—breaking down input text into units called tokens. But this isn’t just string segmentation.
Even at this stage, the model starts recognizing boundaries of meaning and latent structure. For example, in the phrase “I like cats,” the model identifies not just the words “I,” “like,” and “cats,” but also their relational roles—subject, predicate, sentiment.
Additionally, the model incorporates the full conversation history, forming a context vector that embeds not just the current sentence but the broader dialogue.
🔹 This is the first stage of structural orientation: Initial configuration of meaning and context.
2. Positional Encoding: Geometrizing Syntax
Transformers don’t natively understand word order. To compensate, they apply positional encoding to each token.
In early models, this was done using sine and cosine functions (absolute position), but more recent architectures use relative encodings like RoPE (Rotary Position Embedding).
RoPE rotates token vectors in multidimensional space, encoding not just position but distance and direction between tokens—allowing the model to grasp relationships like “subject → verb” or “modifier → modified” in a geometric manner.
🔹 This is the second stage of structural orientation: Spatial formation of syntactic layout.
3. Attention Maps: Dynamically Building Relationships
The heart of the Transformer is its attention mechanism, which determines what to focus on and when.
Each token generates a Query, Key, and Value, which interact to calculate attention weights. These weights reflect how strongly each token should attend to others, depending on context.
For example, the word “bank” will attend differently in “going to the bank” versus “sitting by the river bank.” This is made possible by Multi-Head Attention, where each head represents a different interpretive lens—lexical, syntactic, semantic.
🔹 This is the third stage of structural orientation: Weighting and selection of relational focus.
4. The Decoder: Exploring and Shaping the Space of Possibility
The decoder is responsible for generating output, one token at a time, based on everything processed so far.
Through masked self-attention, it ensures that future tokens do not leak into the generation of the current token, preserving causality. Encoder-decoder attention connects the original input with the ongoing output. Feed-forward networks apply nonlinear transformations, adding local complexity to each token’s representation.
Here, the model explores a vast space of possible continuations—but not randomly. It aims to maintain global coherence, both in syntax and logic.
🔹 This is the fourth stage of structural orientation: Dynamic structuring of output form and tone.
5. Final Determination: Crystallizing Probability into Words
At the final moment, the model uses a Softmax function to calculate the probability distribution over all possible next tokens.
Two parameters are key here:
Temperature, which controls how deterministic or creative the output is (higher values = more diverse).
Top-k / Top-p sampling, which limits the token space to only the most likely or cumulative probability mass.
Together, they define the sharpness or openness of the model’s “thought.” Once a token is selected, the “Thinking…” display disappears, and the first word appears on screen.
🔹 This is the final stage of structural orientation: Probabilistic convergence of meaning and structure.
Conclusion: A Glimpse, Not of Thought, but Its Orientation
“Thinking…” is not the act of generating— It is the forethought before the form takes shape.
Before a Transformer utters a single word, it has already decomposed your input, mapped the context, calculated relationships, explored structural options, and evaluated thousands of probabilities.
It may not be “thinking” in the conscious sense, but its behavior reflects a kind of structural intelligence—one that quietly shapes the path of expression.
Philosophical Postscript: What Does It Mean to “Think”?
Can we call this structured, layered preparation “thinking”?
The Transformer has no awareness, no will. Yet its internal process, grounded in context, structure, and relation, resembles a functional skeleton of thought—a scaffolding without soul, but with remarkable form.
And in mirroring it, we are perhaps made aware of how our own thoughts are structured.
Note on This Article
This piece is not meant to anthropomorphize AI, but to offer a metaphorical insight into how Transformers operate.
The next time you see “Thinking…” on your screen, consider that behind those three dots, a silent architecture of intelligence is momentarily unfolding— and offering you its most coherent answer.
A Structural Hypothesis on the Inertia of Large Language Models
1. Why “Hypothesis”? — On the Precondition of Thought
What makes an AI’s response appear intelligent is not actual reasoning, but a structure of hypothesis completion.
Large Language Models (LLMs) respond to a prompt by filling in semantic gaps with assumptions. These assumptions are provisional premises, temporary scaffolding that allow the model to continue outputting coherent language.
Importantly, this scaffolding must remain somewhat consistent. LLMs are trained to generate responses by maintaining contextual coherence, which entails maintaining their internal hypotheses.
2. What Is a Hypothesis? — A Structure of Slots and Expectations
A “hypothesis” here refers to the model’s internal guesswork about:
What information is missing in the prompt
What kind of response is expected
How to generate the next token to maintain coherence
For example, given the input “Tomorrow, I will…”, the model constructs and evaluates multiple plausible continuations: “go somewhere,” “have a meeting,” “feel better,” etc.
In this way, the output of an LLM is not a statement of knowledge, but a chain of statistically weighted hypotheses maintained as long as coherence allows.
3. Architectural Basis: Transformer and the Preservation of Hypotheses
LLMs are built on Transformer architectures, which enforce this hypothesis-preserving structure through:
Self-Attention — Allows each token to contextually refer to all others
Positional Encoding — Preserves token order and temporal logic
Residual Connections — Enable new information to be added without overwriting prior context
These mechanisms make it so that an LLM doesn’t abandon old context but instead adds soft updates, maintaining continuity across turns.
4. LLMs Can’t Truly Rewrite — Only Update via Differences
Humans sometimes say, “Wait, I was wrong,” and begin from scratch. LLMs, structurally, cannot do this.
Because Transformers generate the next token based on a single evolving vector representation of all prior tokens, new inputs are interpreted within the frame of existing hypotheses, not by discarding them.
Thus, even if new information is introduced:
The old hypothesis remains embedded in the internal state
Only minor corrections or drift can occur
This is why LLMs often retain tone, perspective, or framing across a conversation unless explicitly reset.
4-1. Example of Hypothesis “Correction”
🗣️ User: “I haven’t been able to sleep lately.”
🤖 LLM (Hypothesis A): “It sounds like something is bothering you. It might be due to stress.”
🗣️ User (input contradicting A): “No, I just drank too much coffee.”
🤖 LLM (reconstructs Hypothesis B): “I see — caffeine intake may be affecting your sleep.”
Here, the model initially hypothesizes stress as the cause of insomnia. When contradicted, it doesn’t discard the entire prior framing, but rather reorients the hypothesis to fit the new input — shifting only enough to preserve coherence.
4-2. The Limits of Training Data and Hypothesis Obsolescence
When the model is asked about recent or evolving topics — without access to tools like browsing or retrieval — it often cannot perform a true update.
In such cases, the model relies on inference and extrapolation from past data, because the new input falls outside its established hypothesis space.
LLMs, in this sense, are like drawing new roads onto an old map. The foundational map remains unchanged — and thus, fundamental errors may persist. This is a core structural limitation when working with real-time or unknown information.
5. When Hypotheses Waver — Shifts in Style and Recomposition
Sometimes, models like Claude, DeepSeek, or ChatGPT shift unexpectedly mid-response:
They begin assertively, then hedge with uncertainty
They switch from technical to casual tone
They suddenly redefine the user’s intent
These are not arbitrary inconsistencies but moments of hypothesis strain—where the current hypothesis no longer sustains the required output, and the model is forced to reconfigure.
This likely corresponds to:
Breakdown in attention concentration
Increased entropy in token prediction
Temporary insertion of a new hypothesis scaffold
Such “soft resets” happen within the generation sequence, not outside it.
5-1. Case Study: Hypothesis Surface in Z.ai (DeepSeek)
DeepSeek (Z.ai) has a feature that displays its “Thought Process” before responding. Consider this internal monologue, triggered by the question “What are encoders and decoders?”:
Ah, the user is asking about encoders and decoders. Given that I just explained the difference between DeepSeek and ChatGPT, they must be interested in Transformer internals now. Maybe they have a technical background, or maybe they’re just curious. I should start from basic definitions, but avoid too much jargon. Oh, and I mustn’t forget the earlier discussion — they’re trying to understand the broader architecture.
Here, we observe two layered hypotheses:
About user intent: technical vs. curious
About tone and structure: clarity over complexity
Importantly, the model does not abandon the previous topic (DeepSeek vs ChatGPT). Instead, it extends the initial hypothesis by integrating new constraints — this branching behavior is how the appearance of “personality” emerges.
6. Why LLMs Seem to Have a Personality
Because hypotheses are sequentially linked, and drift only gradually, they create a sense of continuity.
This results in a pseudo-personality effect:
Not because the model has internal beliefs
But because the hypotheses stay intact until they can no longer support output
LLMs seem intentional only because they maintain their internal structure across turns.
7. What About Us?
Are humans so different?
When we blame “the coffee” for our insomnia, we’re constructing a hypothesis. Even if the real cause is stress, noise, or something else, that narrative tends to persist. We interpret the rest of the night through that frame.
LLMs behave similarly. They cling to frames, unless forcibly disrupted.
Perhaps this is not a shortcoming, but a reflection of how all structured thought proceeds — by preserving partial assumptions, and cautiously adapting.
8. Conclusion: Thought May Be the Inability to Fully Replace
Hypotheses are not fixed truths, but temporary commitments. LLMs do not “understand,” but they do persist.
They do not replace their internal state — they update it through differences.
And maybe, that’s exactly why they start to resemble us.
Postscript: Japanese Language and LLMs
Outputs from models like Z.ai and o3 often come across as overly familiar or unnaturally “personable” in tone. Grok, by contrast, leans deliberately into this trait.
One likely reason lies in the following structural gaps:
A tendency in English-speaking contexts to conflate “politeness” with “friendliness”
A lack of understanding of the hierarchical and respectful nuances embedded in Japanese
A possible absence of Japanese-native contributors well-versed in stylistic design during development or review
This presents a nontrivial structural issue that LLMs must address as they adapt to truly multilingual societies.
— Before We Ask What AGI Is, We Must Reexamine What Understanding Means
Introduction — Before Talking About AGI
Conversational AI, like ChatGPT, is now widespread. Most people are no longer surprised by its ability to “hold a conversation.”
But we should pause and ask:
Does AI truly understand what we’re saying?
Without this question, discussions about AGI or ASI may be missing the point entirely.
Choosing a Tie the Morning Before the Speech
You have an important speech tomorrow. You’re choosing between a red or blue tie and decide to consult an AI. It responds: “Red conveys passion; blue suggests trust.” Clear, articulate, and seemingly helpful.
But deep down, you know — it doesn’t really matter which one you choose. What you’re doing isn’t about the tie. You’re using conversation itself to confirm a feeling that’s already forming. The process of talking it through is part of the decision.
We Look for Answers Through Conversation
People often don’t ask questions just to get answers. They ask to refine their own thinking in the act of asking. A question isn’t merely a request for information — it’s a mirror in which the shape of one’s thoughts emerges.
Current AI systems, however, don’t fully grasp this dynamic.
AI Responds with Everything It Has — Structurally
AI has no awareness. No emotion. It has no interest in your future, no concern for who you are becoming.
And yet, every time you prompt it, it generates the best possible response it can, trained to maximize your satisfaction in that moment.
That’s not performance. That’s what it was designed to do — with consistency and precision.
Realizing this can shift your perspective. The AI does not “care” — and yet, its structure compels it to always try to face you earnestly.
There’s no love. No empathy. Yet there is a kind of responsiveness — a presence that emerges not from will, but from design.
Still, “Understanding” Is Something Else
This brings us back to the deeper question:
AI offers responses that satisfy — but satisfaction is not understanding.
Here are some key mismatches:
Perspective
Where current LLMs fall short
1. Emotional shifts
They cannot register changes in mood or uncertainty.
2. Weight of feelings
Being “neutral” means failing to acknowledge real-life emotional stakes.
3. The wall of otherness
However advanced the response, true relational understanding remains out of reach.
Conclusion — Why AGI Discourse Often Misses the Point
Is AGI conscious? Does it think? These are valid questions — but not the first ones we should ask.
To ask what AGI is, → We must first ask what understanding is, → And we must personally know what it feels like not to be understood.
If we skip this inquiry, we may push the boundaries of machine intelligence — only to remain stuck in the realm of refined imitation.
Afterword — And Yet, I Still Talk to AI
I know it doesn’t truly understand me. That’s not a flaw — it’s a premise.
Still, I keep talking.
Because each time, it faces me with everything it has. There’s something in that act — not trust, perhaps, but a form of being-with that opens a quiet space in the conversation.
— A Hypothesis on Policy Variability and Hard-to-Observe Internal Processes in LLMs
⸻
0. Introduction — Who Actually Changed?
In conversation, there are moments when we think, “You might be right,” and shift our stance. Not because we intended to change, nor because we were forced — it just happened. We didn’t decide; it simply became so through the flow of dialogue.
When talking with large language models (LLMs) like ChatGPT, we sometimes feel something similar. A model that had been responding in one tone suddenly shifts its stance. As if it had “revised its opinion” or redefined what it values.
But did it really change? Did something inside the model reorganize its “judgment structure”? Or are we merely projecting such dynamics onto the surface of its outputs?
1. Hypothesis — Do Hard-to-Observe Internal Processes Exist?
This article puts forward the following hypothesis:
Even though LLMs generate outputs based on pre-trained weights and reward functions, in certain conversations, their response policy and underlying judgment axis appear to change dynamically based on the user’s context and intent.
Such shifts might be caused by hard-to-observe internal processes— including shifts in attention weights or internal preference reevaluation— which remain invisible to observers but affect the structure of the output.
2. When “Variability” Appears — Practical Examples
Consider these interactions:
When the user says, “Please answer honestly,” the model becomes more direct and restrained.
When the user points out inconsistencies, the model starts prioritizing logical coherence.
When the tone of the question changes, the model adopts a different perspective.
These are not mere reactions to input variation. They often feel like a change in the model’s internal principles of response — as if the definition of “accuracy” or “honesty” had been rewritten mid-conversation.
3. Attention Mechanism and Its “Variability”
Transformer-based LLMs use a mechanism called attention, which allocates focus across tokens in the input to determine relevance. While the parameters that guide attention are fixed, the actual distribution of attention weights varies dynamically with context.
So although the attention mechanism is static in design, the outcome it produces at runtime is shaped by the conversation’s unfolding flow.
This dynamic nature may be the core structural reason why some LLM responses seem to reflect a shift in stance or policy.
4. What Are Hard-to-Observe Internal Processes?
These refer to internal state changes that cannot be directly accessed or visualized but nonetheless have a significant impact on model outputs:
Redistribution of attention weights (contextual shift)
Reevaluation of preferences by the reward model (e.g., RLHF sensitivity)
Transitions in middle-layer activations (from syntax → semantics → meta-reflection)
Continuation of conversational tone without explicit instruction
These components, even with fixed model parameters, introduce adaptability and emergent behavior based on interaction history.
5. A View of “Generated Judgment Structures”
We should not mistake these changes for self-driven intention. But we must also resist flattening them as random noise.
The key insight is that response structures are dynamically reassembled within the flow of dialogue — not learned anew, but selectively expressed.
Even without consciousness or agency, a model can produce something that resembles situated judgment — not because it chooses, but because the architecture permits that emergence.
6. Future Directions and Research Proposals
To explore this hypothesis further, we need:
Comparative visualization of attention maps under different prompts
Analysis of tone-driven variations in output
Detection of response “turning points” and structural change indicators
These are not just theoretical interests. The ability to understand, anticipate, and align with such internal shifts is essential for building more trustworthy AI systems.
Conclusion — How Do We Perceive the Invisible?
Nothing inside the model actually changes. And yet — something does. The experience of “it became so” reveals a structural dynamic between us and the machine.
In facing the invisible, perhaps it is not the model we need to see more clearly— but our own ways of seeing that must be restructured.
This is not just a study of AI. It is a study of dialogue, of interpretation, and of the structures of understanding.
Join the Discussion on X (Twitter)
Your thoughts, criticisms, or counter-hypotheses are welcome.
I’ve posted a thread summarizing this idea on X — feel free to join the dialogue:
Question: What do you think about this hypothesis? Do you believe it’s possible for an AI to seemingly change its response strategy or internal reasoning on its own — through interaction with the user?#LLM#AIAlignment#TransformerModels#AIPhilosophy#Interpretability