Tag: technology
-

Controlling AI Agent Participation in Group Conversations (Koala)
Last Friday, we had the opportunity to hear from Justin Weisz, Stephanie Houde, Steven Ross, and the IBM team about their research on controlling AI agent participation in group conversations. They ran a set of studies with a Slack bot called Koala to understand how an agent should behave in live multiparty brainstorming sessions. Read…
-

Inner Thoughts – Notes
We had the opportunity to host Bruce Liu, one of the authors of the Inner Thoughts paper, in our team’s AI learning session today. Sharing my key takeaways. Key Takeaways
-

MUCA – Notes
In today’s AI Learning session, we had the opportunity to meet Manqing Mao and Jianzhe Lin who co-authored MUCA. Capturing my notes here. There are several interesting ideas in the paper that are applicable to multi-human <-> agent collaboration. Multi-User Chat Assistant (MUCA): Framework for LLM-Mediated Group Conversations MUCA targets multi-user, single-agent interactions — a…
-

Embeddings & Similarity Metrics
When asked what embedding model and similarity metric they’ve used, most people answer something like: “OpenAI embeddings with cosine similarity.” That’s a perfectly valid answer. But it leads to deeper questions: These were some of the questions we dug into in our team learning session last Friday. Let’s walk through the key takeaways. First: the…
-

Context Rot
Last Friday, our learning session covered Context Rot, a paper from the Chroma vector database team on how longer inputs affect LLM performance. They ran experiments with 18 leading LLMs, like o3, GPT-4.1, Claude, Gemini, and Qwen, on needle-in-a-haystack style questions, then measured how often the models gave the right answer. The best way to…
-

Defining “AGI”
This week, one of the papers we discussed in my team was, the spicely titled, “What The F*ck Is Artificial General Intelligence?” by Michael Timothy Bennett, which I found after hearing him on MLST (still one of my favorite podcasts, has super high signal/noise). Several interesting points came up:
-

MCP Universe
Salesforce AI’s new MCP-Universe benchmark puts frontier models through 200+ real-world tool-use tasks. The results: GPT-5 lands at 43.7%, Grok-4 at 33.3%, and Claude-Sonnet at 29.4%. The rest of this post breaks down why these numbers are so much lower than BFCL, what domains drag models down most, and what the findings mean for teams…
-

AI Adoption Maturity Model
I was reflecting on some of the transformations we’ve seen internally at Microsoft, especially in teams adopting AI, and I think it’s helpful to frame progress in levels. It cuts through vanity metrics like “does the team have copilot access” and asks a better question: how much has this team actually changed because of AI?…
-

Yes, “To MCP, or not” is the real question
Everyone’s excited about MCP and I love it. But, just because you can expose every API endpoint as a tool doesn’t mean you should. This is something Tom Laird-McConnell and I discussed when OpenAPI based plugins were all the rage, and it stuck with me: APIs and user actions are rarely a clean 1:1 mapping.…
-

“First they ignore you, then they laugh at you, then they fight you, then you win.”
This famous saying, often attributed to Gandhi, captures the predictable phases of resistance bold new ideas face. But here’s a thought: when you’re having a critical reaction to an idea, have you ever stopped to wonder if you’re the “they” in this quote? Let’s be honest: most of us have been “they” at some point.…