When Combinations of Humans and A.I. are Useful
∞ Nov 10, 2024This study from MIT researchers raises some challenging questions about collaborative AI interfaces, “human in the loop” supervision, and the value of explaining AI logic and confidence.
Their meta-study looked at over 100 experiments of humans and AI working both separately and together to accomplish tasks. They found that some tasks benefited a ton from human-AI teamwork, while others got worse from the pairing.
Poor Performers Make Poor Supervisors
For tasks where humans working solo do worse than AI, the study found that putting humans in the loop to make final decisions actually delivers worse results. For example, in a task to detect fake reviews, AI working alone achieved 73% accuracy, while humans hit 55%—but the combined human-AI system landed at 69%, watering down what AI could do alone.
In these scenarios, people oscillate between over-reliance (“using suggestions as strong guidelines without seeking and processing more information”) and under-reliance (“ignoring suggestions because of adverse attitudes towards automation”).
Since the people were less accurate, in general, than the AI algorithms, they were also not good at deciding when to trust the algorithms and when to trust their own judgement, so their participation resulted in lower overall performance than for the AI algorithm alone.
Takeaway: “Human in the loop” may be an anti-pattern for certain tasks where AI is more high-performing. Measure results; don’t assume that human judgment always makes things better.
Explanations Didn’t Help
The study found that common design patterns like AI explanations and confidence scores showed no significant impact on performance for human-AI collaborative systems. “These factors have received much attention in recent years [but] do not impact the effectiveness of human-AI collaboration,” the study found.
Given our result that, on average across our 300+ effect sizes, they do not impact the effectiveness of human-AI collaboration, we think researchers may wish to de-emphasize this line of inquiry and instead shift focus to the significant and less researched moderators we identified: the baseline performance of the human and AI alone, the type of task they perform, and the division of labour between them.
Takeaway: Transparency doesn’t always engage the best human judgment; explanations and confidence scores need refinement—or an entirely new alternative. I suspect that changing the form, manner, or tone of these explanations could improve outcomes, but also: Are there different ways to better engage critical thinking and productive skepticism?
Creative Tasks FTW
The study found that human-AI collaboration was most effective for open-ended creative and generative tasks—but worse at decision-making tasks to choose between defined options. For those decision-making tasks, either humans or AI did better working alone.
We hypothesize that this advantage for creation tasks occurs because even when creation tasks require the use of creativity, knowledge or insight for which humans perform better, they often also involve substantial amounts of somewhat routine generation of additional content that AI can perform as well as or better than humans.
This is a great example of “let humans do what they do best, and let machines do what they do best.” They’re rarely the same thing. And creative/generative tasks tend to have elements of each, where humans excel at creative judgment, and the machines excel at production/execution.
Takeaway: Focus human-machine collaboration on creative and generative tasks; humans and AI may handle decision-making tasks better solo.
Divide and Conquer
A very small number of experiments in the study split tasks between human and machine intelligence based on respectie strengths. While only three of the 100+ experiments explored this approach, the researchers hypothesized that “better results might have been obtained if the experimenters had designed processes in which the AI systems did only the parts of the task for which they were clearly better than humans.” This suggests an opportunity for designers to explore more intentional division of labor in human-AI interfaces. Break out your journey maps, friends.
Takeaway: Divvy up and define tasks narrowly around the demonstrated strengths of both humans and machines, and make responsibilities clear for each.
This AI Pioneer Thinks AI Is Dumber Than a Cat
∞ Oct 14, 2024Christopher Mims of the Wall Street Journal profiles Yann LeCun, AI pioneer and senior researcher at Meta. As you’d expect, LeCun is a big believer in machine intelligence—but has no illusions about the limitations of the current crop of generative AI models. Their talent for language distracts us from their shortcomings:
Today’s models are really just predicting the next word in a text, he says. But they’re so good at this that they fool us. And because of their enormous memory capacity, they can seem to be reasoning, when in fact they’re merely regurgitating information they’ve already been trained on.
“We are used to the idea that people or entities that can express themselves, or manipulate language, are smart—but that’s not true,” says LeCun. “You can manipulate language and not be smart, and that’s basically what LLMs are demonstrating.”
As I’m fond of saying, these are not answer machines, they’re dream machines: “When you ask generative AI for an answer, it’s not giving you the answer; it knows only how to give you something that looks like an answer.”
LLMs are fact-challenged and reasoning-incapable. But they are fantastic at language and communication. Instead of relying on them to give answers, the best bet is to rely on them to drive interfaces and interactions. Treat machine-generated results as signals, not facts. Communicate with them as interpreters, not truth-tellers.
Beware of Botshit
∞ Oct 13, 2024botshit noun: hallucinated chatbot content that is uncritically used by a human for communication and decision-making tasks. “The company withdrew the whitepaper due to excessive botshit, after the authors relied on unverified machine-generated research summaries.”
From this academic paper on managing the risks of using generated content to perform tasks:
Generative chatbots do this work by ‘predicting’ responses rather than ‘knowing’ the meaning of their responses. This means chatbots can produce coherent sounding but inaccurate or fabricated content, referred to as ‘hallucinations’. When humans use this untruthful content for tasks, it becomes what we call ‘botshit’.
See also: slop.
A Radically Adaptive World Model
∞ Oct 13, 2024Ethan Mollick posted this nifty little demo of a research project that generates a world based on Counter-Strike, frame by frame in response to your actions. What’s around that corner at the end of the street? Nothing, that portion of the world hasn’t been created yet—until you turn in that direction, and the world is created just for you in that moment.
This is not a post that proposes the future of gaming or that tech will replace well-crafted game worlds and the people who make them. This proof of concept is nowhere near ready or good enough for that, except perhaps as a tool to assist/support game authors.
Instead, it’s interesting as a remarkable example of a radically adaptive interface, a core aspect of Sentient Design experiences. The demo and the research paper behind it show a whole world being conceived, compiled, and delivered in real time. What happens when you apply this thinking to a web experience? To a data dashboard? To a chat interface? To a calculator app that lets you turn a blank canvas into a one-of-a-kind on-demand interface?
The risk of radically adaptive interfaces is that they turn into robot fever dreams without shape or destination. That’s where design comes in: to conceive and apply thoughtful constraints and guardrails. It’s weird and hairy and different from what’s come before.
Far from replacing designers (or game creators), these experiences require designers more than ever. But we have to learn some new skills and point them in new directions.
Exploring the AI Solution Space
∞ Oct 13, 2024Jorge Arango explores what it means for machine intelligence to be “used well” and, in particular, questions the current fascination with general-purpose, open-ended chat interfaces.
There are obvious challenges here. For one, this is the first time weâve interacted with systems that match our linguistic abilities while lacking other attributes of intelligence: consciousness, theory of mind, pride, shame, common sense, etc. AIsâ eloquence tricks us into accepting their output when we have no competence to do so.
The AI-written contract may be better than a human-written one. But can you trust it? After all, if youâre not a lawyer, you donât know what you donât know. And the fact that the AI contract looks so similar to a human one makes it easy for you to take its provenance for granted. That is, the better the outcome looks to your non-specialist eyes, the more likely you are to give up your agency.
Another challenge is that ChatGPTâs success has driven many people to equate AIs with chatbots. As a result, the current default approach to adding AI to products entails awkwardly grafting chat onto existing experiences, either for augmenting search (possibly good) or replacing human service agents (generally bad.)
But these âchatbotâ scenarios only cover a portion of the possibility space â and not even the most interesting one.
I’m grateful for the call to action to think beyond chat and general-purpose, open-ended interfaces. Those have their place, but there’s so much more to explore here.
The popular imagination has equated intelligence with convincing conversation since Alan Turing proposed his “imitation game” in 1950. The concept is simple: if a system can fool you into thinking you’re talking to a human, it can be considered intelligent. For the better part of a century, the Turing Test has shaped popular expectations of machine intelligence from science fiction to Silicon Valley. Chat is an interaction cliché for AI that we have to escape (or at least question), but it has a powerful gravitational force. “Speaks well = thinks well” is a hard perception to break. We fall for it with people, too.
The “AI can make mistakes” labels don’t cut it.
Given the outsized trust we have in systems that speak so confidently, designers have a big challenge when crafting intelligent interfaces: how can you engage the user’s agency and judgment when the answer is not actually as confident as the LLM delivers it? Communicating the accuracy/confidence of results is a design job. The “AI can make mistakes” labels don’t cut it.
This isn’t a new challenge. I’ve been writing about systems smart enough to know they’re not smart enough for years. But the problem gets steeper as the systems appear outwardly smarter and lull us into false confidence.
Jorge’s 2x2 matrix of AI control vs AI accuracy is a helpful tool to at least consider the risks as you explore solutions.
This is a tricky time. It’s natural to seek grounding in times of change, which can cause us to cling too tightly to assumptions or established patterns. Loyalty to the long-held idea that conflates conversation with intelligence is doing a disservice. Conversation between human and machine doesnât have to mean literal dialogue. Letâs be far more expansive in what we consider âchatâ and unpack the broad forms these interactions can take.
Introducing Generative Canvas
∞ Oct 8, 2024On-demand UI! Salesforce announced its pilot of “generative canvas,” a radically adaptive interface for CRM users. It’s a dynamically generated dashboard that uses AI to assemble the right content and UI elements based on your specific context or request. Look out, enterprise, here comes Sentient Design.
I love to see big players doing this. Here at Big Medium, we’re building on similar foundations to help our clients build their own AI-powered interfaces. It’s exciting stuff! Sentient Design is about creating AI-mediated experiences that are aware of context/intent so that they can adapt in real time to specific needs. Veronika Kindred and I call these radically adaptive interfaces, and it shows that machine-intelligent experiences can be so much more than chat. This new Salesforce experience offers a good example.
For Salesforce, generative canvas is an intelligent interface that animates traditional UI in new and effective ways. It’s a perfect example of a first-stage radically adaptive interface—and one that’s well suited to the sturdy reliability of enterprise software. Generative canvas uses all of the same familiar data sources as a traditional Salesforce experience might, but it assembles and presents that data on the fly. Instead of relying on static templates built through a painstaking manual process, generative canvas is conceived and compiled in real time. That presentation is tailored to context: it pulls data from the user’s calendar to give suggested prompts and relevant information tailored to their needs. Every new prompt or new context gives you a new layout. (In Sentient Design’s triangle framework, we call this the Bespoke UI experience posture.)
So the benefits are: 1) highly tailored content and presentation to deliver the most relevant content in the most relevant format (better experience), and 2) elimination or reduction of manual configuration processes (efficiency).
In Sentient Design, we call this the Bespoke UI experience posture.
Never fear: you’re not turning your dashboard into a hallucinating robot fever dream. The UI stays on the rails by selecting from a collection of vetted components from Salesforce’s Lightning design system: tables, charts, trends, etc. AI provides radical adaptivity; the design system provides grounded consistency. The concept promises a stable set of data sources and design patterns—remixed into an experience that matches your needs in the moment.
This is a tidy example of what happens when you sprinkle machine intelligence onto a familiar traditional UI. It starts to dance and move. And this is just the beginning. Adding AI to the UX/UI layer lets you generate experiences, not just artifacts (images, text, etc.). And that can go beyond traditional UI to yield entirely new UX and interaction paradigms. That’s a big focus of Big Medium’s product work with clients these days—and of course of the Sentient Design book. Stay tuned, lots more to come.
Change Blindness
∞ Aug 13, 2024A great reminder from Ethan Mollick of how quickly things have changed in AI generation quality in the last 18 months. AI right now is the worst that it will ever be; only getting better from here. Good inspiration to keep cranking!
When I started this blog there were no AI chatbot assistants. Now, all indications that they are likely the fastest-adopted technology in recent history.
Plus, super cute otters.
Introducing Structured Outputs in the API
∞ Aug 7, 2024OpenAI introduced a bit of discipline to ensure that its GPT models are precise in the data format of their responses. Specifically, the new feature makes sure that, when asked, the model responds exactly to JSON schemas provided by developers.
Generating structured data from unstructured inputs is one of the core use cases for AI in today’s applications. Developers use the OpenAI API to build powerful assistants that have the ability to fetch data and answer questions via function calling(opens in a new window), extract structured data for data entry, and build multi-step agentic workflows that allow LLMs to take actions. Developers have long been working around the limitations of LLMs in this area via open source tooling, prompting, and retrying requests repeatedly to ensure that model outputs match the formats needed to interoperate with their systems. Structured Outputs solves this problem by constraining OpenAI models to match developer-supplied schemas and by training our models to better understand complicated schemas.
Most of us experience OpenAI’s GPT models as a chat interface, and that’s certainly the interaction of the moment. But LLMs are fluent in lots of languages—not just English or Chinese or Spanish, but JSON, SVG, Python, etc. One of their underappreciated talents is to move fluidly between different representations of ideas and concepts. Here specifically, they can translate messy English into structured JSON. This is what allows these systems to be interoperable with other systems, one of the three core attributes that define the form of AI-mediated experiences, as I describe in The Shape of Sentient Design.
What this means for product designers: As I shared in my Sentient Design talk, moving nimbly between structured and unstructured data is what enables LLMs to help drive radically adaptive interfaces. (This part of the talk offers an example.) This is the stuff that will animate the next generation of interaction design.
Alas, as in all things LLM, the models sometimes drift a bit from the specific ask—the JSON they come back with isn’t always what we asked for. This latest update is a promising direction for helping us get disciplined responses when we need it—so that Sentient Design experiences can reliably communicate with other systems.
Why I Finally Quit Spotify
∞ Aug 3, 2024In The New Yorker, Kyle Chayka bemoans the creeping blandness that settled into his Spotify listening experience as the company leaned into algorithmic personalization and playlists.
Issues with the listening technology create issues with the music itself; bombarded by generic suggestions and repeats of recent listening, listeners are being conditioned to rely on what Spotify feeds them rather than on what they seek out for themselves. “You’re giving them everything they think they love and it’s all homogenized,” Ford said, pointing to the algorithmic playlists that reorder tracklists, automatically play on shuffle, and add in new, similar songs. Listeners become alienated from their own tastes; when you never encounter things you don’t like, it’s harder to know what you really do.
This observation that the automation of your tastes can alienate you from them feels powerful. There’s obviously a useful and meaningful role for “more like this” recommendation and prediction engines. Still, there’s a risk when we overfit those models and eliminate personal agency and/or discovery in the experience. Surely there’s an opportunity to add more texture—a push and pull between lean-back personalization and more effortful exploration.
Let’s dial up the temperature on these models, or at least some of them. Instead of always presenting “more like this” recommendations, we could benefit from “more not like this,” too.
AI Is Confusing — Here’s Your Cheat Sheet
∞ Jul 28, 2024Scratching your head about diffusion models versus frontier models versus foundation models? Don’t know a token from a transformer? Jay Peters assembled a helpful glossary of AI terms for The Verge:
To help you better understand what’s going on, we’ve put together a list of some of the most common AI terms. We’ll do our best to explain what they mean and why they’re important.
Great, accessible resource for literacy in fundamental AI lingo.
Turning the Tables on AI
∞ Jul 28, 2024Oliver Reichenstein shares strategies for using AI to elevate your own writing instead of handing the job entirely to the robots. (This rhymes nicely with the core principle of Sentient Design: amplify judgment and agency instead of replacing it.)
Let’s turn the tables and have ChatGPT prompt us. Tell AI to ask you questions about what you’re writing. Push yourself to express in clear terms what you really want to say. Like this, for example:
I want to write [format] about [topic]. Ask me questions one at a time that force me to explain my idea.
Keep asking until your idea is clear to you.
Reichenstein is CEO and founder of iA, the maker of iA Writer. One of its features helps writers track facts and quotes from external sources. Reichenstein suggests using it to track AI-generated contributions:
What if the ChatGPT generated something useful that I want to keep? Paste it as a note Marked as AI. Use quotes, use markup, and note its origin. … iA Writer greys out text that you marked as AI so you can always discern what is yours and what isn’t.
It’s a good reminder that you can design personal workflows to use technology in ways that serve you best. What do you want AI to do for you? And as a product designer, how might you build this philosophy into your AI-mediated features?
Doctors Use A.I. Chatbots to Help Fight Insurance Denials
∞ Jul 28, 2024In The New York Times, Teddy Rosenbluth reports on doctors using AI tools to automate their fight with insurance companies’ (automated) efforts to refuse or delay payment:
Doctors and their staff spend an average of 12 hours a week submitting prior-authorization requests, a process widely considered burdensome and detrimental to patient health among physicians surveyed by the American Medical Association.
With the help of ChatGPT, Dr. Tward now types in a couple of sentences, describing the purpose of the letter and the types of scientific studies he wants referenced, and a draft is produced in seconds.
Then, he can tell the chatbot to make it four times longer. “If you’re going to put all kinds of barriers up for my patients, then when I fire back, I’m going to make it very time consuming,” he said.
I admire the dash of spite in this effort! But is this an example of tech leveling the playing field, or part of an AI-weaponized red-tape arms race that no one can win?