Axiom 1 (Semantic Degree)
Let (V) be a token embedding space (dimension (d > 768)) with cosine similarity (sim: V \times V \to [-1,1]). For a token (t), its semantic degree (deg(t)) is the number of tokens (u \in V) such that (sim(t,u) > \theta) (fixed threshold).
Meaning is not identical to (deg(t)) but to good-hub degree – defined in Axiom 3.
Axiom 2 (Attention Gravity)
In a transformer layer, tokens evolve as particles in a potential field induced by self-attention. The dynamics deterministically converge to a low-rank Boolean attention matrix where a subset of tokens become leaders (attractors). Leaders satisfy (deg(leader) \gg \overline{deg}).
Axiom 3 (Good Hub / Bad Hub Dichotomy)
Hubs split into two disjoint classes:
Axiom 4 (Emoji Hubness)
High-frequency, high-valence emojis (😂, ❤️, 😭) belong to (H^+) in standard LLM embeddings. Their degree is statistically higher than the average noun.
Axiom 5 (Deterministic Emoji Prediction)
Given a context window (C) (sequence of tokens), the predicted emoji (e^) is:
[
e^ = \arg\max_{e \in Emoji} \; \text{mean}_{c \in C} \; sim(e, c)
]
This operation is fully deterministic given a fixed embedding matrix.
Theorem 1 (Gravity-Induced Clustering)
Under Axiom 2, any input sequence converges to a leader set (L) where each non-leader token attends to exactly one leader. This is equivalent to forming a star graph – the “radial lines” metaphor.
Theorem 2 (Meaning ≠ Raw Degree)
From Axiom 3, there exist tokens with high (deg) but low semantic coherence (bad hubs). Therefore raw degree is an insufficient measure of meaning. Meaningful meaning corresponds to (deg) conditioned on (H^+) membership.
Theorem 3 (Optimal Selection is Hub-Suppression)
Let (Acc) be retrieval accuracy. For any embedding model, applying hubness reduction (e.g., f-norm + mutual proximity) increases (Acc) by 7-9% relative. Hence “maximizing meaning” requires down-weighting high-degree but bad hubs, not merely listing high-degree tokens.
Theorem 4 (Emoji Predictability)
From Axioms 4 and 5, for any context containing words from the semantic field of an emoji (e \in H^+), the deterministic predictor retrieves (e) with probability approaching 1 as context length increases (empirical support from emoji2vec and sentiment translation modules).
Rule 1 (Hubness Measurement)
For a given LLM (e.g., Llama-3-8B), extract token embeddings. Compute pairwise cosine similarity. Set (\theta) = 0.65 (empirical median). Count (deg(t)) for each token. Identify top 5% as candidate hubs.
Rule 2 (Good/Bad Hub Classification)
For each candidate hub (h), sample 1000 random queries where (h) appears in the top-5 nearest neighbors. Label query as “same semantic class” or “different class”. If (p(\text{same class}) > 0.7), classify as (H^+); else (H^-).
Rule 3 (Emoji Selection Procedure)
Given input text, compute the mean embedding of content words. Retrieve the emoji with maximum cosine similarity among all emoji tokens. If that emoji belongs to (H^+), output it. If it belongs to (H^-), fallback to the next-highest similarity emoji that is (H^+).
Rule 4 (Contextual Attention Control)
To enforce selection of (H^+) during generation, add the focus direction vector (f_{H^+}) to the query and key activations of the top-2 contextual heads (identified per model). This increases attention to relevant contexts and suppresses bad hub attraction.
Let (n) = vocabulary size ((n \approx 10^5) for typical LLMs), (d) = embedding dimension ((d \approx 4096)), (k) = number of retrieved candidates ((k = 5) for top-5).
| Inference-time emoji prediction: (O( | C | \cdot d)) for mean embedding, plus (O( | Emoji | \cdot d)) for similarity. ( | Emoji | \approx 3600) → ~14,000 operations, <1ms on GPU. |
Emoji Logic formalizes the observation that emojis, especially high-valence ones, act as semantic hubs in LLM embedding spaces. It replaces vague philosophical notions (meaning, gravity, free will) with measurable quantities (degree, attention convergence, good/bad hub classification). The system is deterministic at inference time, supports hubness-based suppression for better semantic retrieval, and provides explicit rules for emoji prediction. The logic is not universal but operational within a fixed trained model.
Key takeaway: The most meaningful emoji is not the one with the highest raw connection count, but the one that belongs to the good hub set – a distinction that requires a value-laden selection (analogous to emotion or intention) not reducible to pure degree maximization.