Comparison
Humalike vs Inworld.
Inworld goes deep on realtime voice. Humalike sits above the voice stack and handles the social behavior — turn-taking, persona, theory of mind, memory.
Them
Inworld
Realtime voice AI platform
Realtime TTS-2 (May 2026), Realtime STT with prosody profiling, a Router across 200+ LLMs, plus an Agent Runtime. Originally built for game NPCs; in 2026 a voice-first developer platform.
Us
Humalike
Behavioral infrastructure
Behavioral primitives — turn-taking, norms, persona, memory, theory of mind, social signals — composed above any voice stack and any LLM. Inworld is a clean voice layer to plug in underneath.
The core differences.
No feature-checklist arms race. Just the dimensions that decide which one fits — with the honest call on each.
Inworld
Humalike
Realtime voice depth (TTS / STT)
TTS-2 + STT with speaker profiling.
Plug Inworld in as the voice layer.
Realtime voice depth (TTS / STT)
Inworld
TTS-2 + STT with speaker profiling.
Humalike
Plug Inworld in as the voice layer.
Built-in LLM router
Router across 200+ models.
Bring your own model + router.
Built-in LLM router
Inworld
Router across 200+ models.
Humalike
Bring your own model + router.
Composable behavioral primitives
Voice/router primitives, no behavior APIs.
7 behavioral APIs you compose.
Composable behavioral primitives
Inworld
Voice/router primitives, no behavior APIs.
Humalike
7 behavioral APIs you compose.
Multi-human rooms
Voice agents are 1:1.
Multi-party as a first-class case.
Multi-human rooms
Inworld
Voice agents are 1:1.
Humalike
Multi-party as a first-class case.
Cross-session relational memory
No memory primitive exposed.
Memory per person, across sessions.
Cross-session relational memory
Inworld
No memory primitive exposed.
Humalike
Memory per person, across sessions.
Theory of mind + social signals
Speaker profiling on voice only.
Beliefs, intent, typing pauses, edits.
Theory of mind + social signals
Inworld
Speaker profiling on voice only.
Humalike
Beliefs, intent, typing pauses, edits.
Text-channel agents
Voice-first stack.
Voice and text.
Text-channel agents
Inworld
Voice-first stack.
Humalike
Voice and text.