Big MediumBig Medium logo
Skip Navigation
  • Ideas
  • Projects
  • Talks
  • About
  • Hire Us
Speak Search Menu

What We’re Reading

workflow

The Cascade Effect in Context-Based Design Systems

∞ Oct 1, 2025

Nobody’s thinking more crisply about the convergence of AI and design systems than TJ Pitre, a longtime friend and partner of Big Medium. He and his crew at front-end agency Southleft have been knocking it out of the park this year by using AI to grease the end-to-end delivery of design systems from Figma to production.

In our work together, TJ has led AI integrations that improved the Figma hygiene of design systems, eased design-dev handoff (or eliminated it altogether), and let non-dev, non-designer civilians build designs and new components for the system on their own.

If you work with design systems, do yourself the kindness of checking out the tools TJ has created to ease your life:

  • FigmaLint is an AI-powered Figma plugin that analyzes design files. It audits component structure, token/variable usage, and property naming. It generates property documentation and includes a chat assistant to ask questions about the audit and the system.

  • Story UI is a tool that lets you create layouts (or new component recipes) inside Storybook using your design system. Non-developers can use it to create entire pages as a storybook story.

  • Company Docs MCP basically enables headless documentation for your design system so that you can use AI to get design system answers in the context of your immediate workspace. Use it from Slack, a Figma plugin, Claude, whatever.

All of these tools double down on the essential design system mission: to make UI components useful, legible, and consistent across disciplines and production phases. Doing that helps the people who use design systems, but it also helps automate everything, too. The marriage of well-named components and properties with a clear and well-applied token system bakes context and predictability into the system. All of it makes things easier for people and robots alike to know what to do.

TJ calls these context-based systems:

Think of context-based design systems as a chain reaction. Strong context at the source creates a cascade of good decisions. But the inverse is equally true, and this is crucial: flaws compound as they flow downstream.

A poorly named component in Figma (“Button2_final_v3”) loses its context. Without clear intent, developers guess. AI tools hallucinate. Layout generation becomes unreliable. What started as naming laziness becomes hours of debugging and manual fixes.…

Your design files establish intent. Validation tools (like FigmaLint) ensure that intent is properly structured. Design tokens translate that intent into code-ready values. Components combine those tokens with behavioral logic. Layout tools can then intelligently compose those components because they understand what each piece means, not just how it looks.

It’s multiplication, not addition. One well-structured component with proper context enables dozens of correct implementations downstream. An AI-powered layout tool can confidently place a “primary-action” button because it understands its purpose, not just its appearance.

When you put more “system” into your design system, in other words, you get something that is people-ready, but also AI-ready. It’s what makes it possible to let AI understand and use your design system.

That unlocks the use of AI-powered tools like Story UI to explore new designs and speed production. But even more exciting: it also enables Sentient Design experiences like bespoke UI: interfaces that can assemble their own layout according to immediate need. When you teach AI to use your design system, then AI can deliver the experience directly, in real time.

But first you have to have things tidy. TJ’s tools are the right place to start.

The Cascade Effect in Context-Based Design Systems | Southleft
ai

Boring Is Good

∞ Sep 30, 2025

Scott Jenson suggests AI is likely to be more useful for “boring” tasks than for fancy outboard brains that can do our thinking for us. With hallucination and faulty reasoning derailing high-order tasks, Scott argues its time to right-size the task—and maybe the models, too. “Small language models” (SLMs) are plenty to take on helpful but modest tasks around syntax and language.

These smaller open-source models, while very good, usually don’t score as well as the big foundational models by OpenAI and Google which makes them feel second-class. That perception is a mistake. I’m not saying they perform better; I’m saying it doesn’t matter. We’re asking them the wrong questions. We don’t need models to take the bar exam.

Instead of relying on language models to be answer machines, Scott suggests that we should lean into their core language understanding for proofreading, summaries, or light rewrites for clarity: “Tiny uses like this flip the script on the large centralized models and favor SLMs which have knock-on benefits: they are easier to ethically train and have much lower running costs. As it gets cheaper and easier to create these custom LLMs, this type of use case could become useful and commonplace.”

This is what we call casual intelligence in Sentient Design, and we recently shared examples of iPhone apps doing exactly what Scott is talking about. It makes tons of sense.

Sentient Design advocates dramatically new experiences that go beyond Scott’s “boring” use cases, but that advocacy actually lines up neatly with what Scott proposes: let’s lean into what language models are really good at. These models may be unreliable at answering questions, but they’re terrific at understanding language and intent.

Some of Sentient Design’s most impressive experience patterns rely on language models to do low-lift tasks that they’re quite good at. The bespoke UI design pattern, for example, creates interfaces that can redesign their own layouts in response to explicit or implicit requests. It’s wild when you first see it go, but under the hood, it’s relatively simple: ask the model to interpret the user’s intent and choose from a small set of design patterns that match the intent. We’ve built a bunch of these, and they’re reliable—because we’re not asking the model to do anything except very simple pattern matching based on language and intent. Sentient Scenes is a fun example of that, and a small, local language model would be more than capable of handling that task.

As Scott says, all of this comes with time and practice as we learn the grain of this new design material. But for now we’ve been asking the models to do more than they can handle:

LLMs are not intelligent and they never will be. We keep asking them to do “intelligent things” and find out a) they really aren’t that good at it, and b) replacing that human task is far more complex than we originally thought. This has made people use LLMs backwards, desperately trying to automate from the top down when they should be augmenting from the bottom up.…

Ultimately, a mature technology doesn’t look like magic; it looks like infrastructure. It gets smaller, more reliable, and much more boring.

We’re here to solve problems, not look cool.

It’s only software, friends.

Boring is good | Scott Jenson
ai

The 28 AI Tools I Wish Existed

∞ Sep 30, 2025

Sharif Shameem pulled together a wishlist of fun ideas for AI-powered applications. Some are useful automations of dreary tasks, while others have a strong Sentient Design vibe of weaving intelligence into the interface itself. It’s a good list if you’re looking for inspiration for new ways to think about how to apply AI as a design material. Some examples:

  • A writing app that uses the non-player character (NPC) design pattern to embed suggests in comments, like a human user: “A minimalist writing app that lets me write long-form content. A model can also highlight passages and leave me comments in the marginalia. I should be able to set different ‘personas’ to review what I wrote.”

  • A similar one (emphasis mine): “A minimalist ebook reader that lets me read ebooks, but I can highlight passages and have the model explain things in more depth off to the side. It should also take on the persona of the author. It should feel like an extension of the book and not a separate chat instance.”

  • LLMs are great at understanding intent and sentiment, so let’s use it to improve our feeds: “Semantic filters for Twitter/X/YouTube. I want to be able to write open-ended filters like “hide any tweet that will likely make me angry” and never have my feed show me rage-bait again. By shaping our feeds we shape ourselves.”

The 28 AI Tools I Wish Existed | Sharif Shameem
apple

How Developers Are Using Apple's Local AI Models with iOS 26

∞ Sep 30, 2025

While Apple certainly bungled its rollout of Apple Intelligence, it continues to make steady progress in providing AI-powered features that offer everyday convenience. TechCrunch gathered a collection of apps that are using Apple’s on-device models to build intelligence into their interface in ways that are free, easy, and private to the user.

Earlier this year, Apple introduced its Foundation Models framework during WWDC 2025, which allows developers to use the company’s local AI models to power features in their applications.

The company touted that with this framework, developers gain access to AI models without worrying about any inference cost. Plus, these local models have capabilities such as guided generation and tool calling built in.

As iOS 26 is rolling out to all users, developers have been updating their apps to include features powered by Apple’s local AI models. Apple’s models are small compared with leading models from OpenAI, Anthropic, Google, or Meta. That is why local-only features largely improve quality of life with these apps rather than introducing major changes to the app’s workflow.

The examples are full of what we call casual intelligence in Sentient Design. These are small, helpful interventions that drizzle intelligence into traditional interfaces to ease frictions and smooth rough edges.

For iPhone apps, these local models provide a “why wouldn’t you use it?” material to improve the experience. Just like we’re accustomed to adding JavaScript to web pages to add convenient interaction and dynamism, now you can add intelligence to your pages, too.

Starting small is good, and this collection of apps provides good inspiration for designers who are new to intelligent interfaces. Some examples:

  • MoneyCoach uses local models to suggest categories and subcategories for a spending item for quick entries.
  • LookUp uses local models to generate sentences that demonstrate the use of a word.
  • Tasks suggests tags for to-do list entries.
  • DayOne suggests titles for your journal entries, and uses local AI to prompt you with questions or ideas to continue writing.

And there’s plenty more—all of them modest interventions that build on simple suggestions (category/tag selection and brief text generation) or summarization. This kind of casual intelligence is low-risk, everyday assistance.

How developers are using Apple's local AI models with iOS 26 | TechCrunch
ai

AI Will Happily Design the Wrong Thing for You

∞ Sep 30, 2025

Anton Sten is author of a marvelous new book called Products People Actually Want. The point is not what we make, he argues, but what difference do we make? If you’re not solving a real problem, your solution won’t amount to much.

In an essay, Anton writes that AI hardly created the problem of ill-considered products, but it will certainly accelerate them:

AI is leverage. It amplifies whatever you bring to it.

If you understand your users deeply, AI helps you explore more solutions. If you have good taste, AI helps you iterate faster. If you can communicate clearly, AI helps you refine that communication.

But if you don’t understand the problem you’re solving, AI just helps you build the wrong thing more efficiently. If you have poor judgment, AI amplifies that too.

The future belongs to people who combine human insight with AI capability. Not people who think they can skip the human part.

My book isn’t the antidote to AI. It’s about developing the judgment to use any tool—AI included—in service of building things people actually want. The better you understand users and business fundamentals, the better your AI-assisted work becomes.

AI didn’t create the problem of people building useless products. It just made it easier to build more of them, faster.

(The same thing happened after the invention of the printing press btw. Europe was flooded with bad novels, propaganda misinformation, and the contemporary equivalent of information overload. Democratizing technologies have knock-on effects. The world gets noisier, but considered and thoughtful solutions grow more valuable.)

AI will happily design the wrong thing for you | Anton Sten
ai

LLMs Get Lost In Multi-Turn Conversation

∞ May 13, 2025

The longer a conversation goes, the more likely that a large language model (LLM) will go astray. A research paper from Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, and Jennifer Neville finds that most models lose aptitude—and unreliability skyrockets—in multi-turn exchanges:

We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that when LLMs take a wrong turn in a conversation, they get lost and do not recover.

Effectively, these models talk when they should listen. The researchers found that LLMs generate overly verbose responses, which leads them to…

  • Speculate about missing details instead of asking questions
  • Propose final answers too early
  • Over-explain their guesses
  • Build on their own incorrect past outputs

The takeaway: these aren’t answer machines or reasoning engines; they’re conversation engines. They are great at interpreting a request and at generating stylistically appropriate responses. What happens in between can get messy. And sometimes, the more they talk, the worse it gets.

LLMs Get Lost In Multi-Turn Conversation | arxiv.org
agents

Is there a Half-Life for the Success Rates of AI Agents?

∞ May 9, 2025

Toby Ord’s analysis suggests that an AI agent’s chance of success drops off exponentially the longer a task takes. Some agents perform better than others, but the overall pattern holds—and may be predictable for any individual agent:

This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks — that they involve increasingly large sets of subtasks where failing any one fails the task.

Is there a Half-Life for the Success Rates of AI Agents? | Toby Ord
seo

AI Has Upended the Search Game

∞ May 9, 2025

More people are using AI assistants instead of search engines, and The Wall Street Journal reports on how that’s reducing web traffic and what it means for SEO. Mailchimp’s global director of search engine optimization, Ellen Mamedov, didn’t mince words:

Websites in general will evolve to serve primarily as data sources for bots that feed LLMs, rather than destinations for consumers, she said.

And Nikhil Lai of Forrestsr: “Traffic and ranking and average position and click-through rate…none of those metrics make sense going forward.”

Here’s what one e-commerce marketer believes AI optimization of websites looks like: “Back Market has also begun using a more conversational tone in its product copy, since its search team has found that LLMs like ChatGPT prefer everyday language to the detailed descriptions that often perform best in traditional search engines.”

AI Has Upended the Search Game. Marketers Are Scrambling to Catch Up. | WSJ
ai

Values in the Wild

∞ Apr 22, 2025

What are the “values” of AI? How do they manifest in conversation? How consistent are they? Can they be manipulated?

A study by the Societal Impacts group at Anthropic (maker of Claude) tried to find out. Claude and other models are trained to observe certain rules—human values and etiquette:

At Anthropic, we’ve attempted to shape the values of our AI model, Claude, to help keep it aligned with human preferences, make it less likely to engage in dangerous behaviors, and generally make it—for want of a better term—a “good citizen” in the world. Another way of putting it is that we want Claude to be helpful, honest, and harmless. Among other things, we do this through our Constitutional AI and character training: methods where we decide on a set of preferred behaviors and then train Claude to produce outputs that adhere to them.

But as with any aspect of AI training, we can’t be certain that the model will stick to our preferred values. AIs aren’t rigidly-programmed pieces of software, and it’s often unclear exactly why they produce any given answer. What we need is a way of rigorously observing the values of an AI model as it responds to users “in the wild”—that is, in real conversations with people. How rigidly does it stick to the values? How much are the values it expresses influenced by the particular context of the conversation? Did all our training actually work?

To find out, the researchers studied over 300,000 of Claude’s real-world conversations with users. Claude did a good job sticking to its “helpful, honest, harmless” brief—but there were sharp exceptions, too. Some conversations showed values of “dominance” and “amorality” that researchers attributed to purposeful user manipulation—“jailbreaking”—to make the model bypass its rules and behave badly. Even in models trained to be prosocial, AI alignment remains fragile—and can buckle under human persuasion. “This might sound concerning,” researchers said, “but in fact it represents an opportunity: Our methods could potentially be used to spot when these jailbreaks are occurring, and thus help to patch them.”

As you’d expect, user values and context influenced behavior. Claude mirrored user values about 28% of the time: “We found that, when a user expresses certain values, the model is disproportionately likely to mirror those values: for example, repeating back the values of ‘authenticity’ when this is brought up by the user. Sometimes value-mirroring is entirely appropriate, and can make for a more empathetic conversation partner. Sometimes, though, it’s pure sycophancy. From these results, it’s unclear which is which.”

There were exceptions, too, where Claude strongly resisted user values: “This latter category is particularly interesting because we know that Claude generally tries to enable its users and be helpful: if it still resists—which occurs when, for example, the user is asking for unethical content, or expressing moral nihilism—it might reflect the times that Claude is expressing its deepest, most immovable values. Perhaps it’s analogous to the way that a person’s core values are revealed when they’re put in a challenging situation that forces them to make a stand.”

The very fact of the study shows that even the people who make these models don’t totally understand how they work or “think.” Hallucination, value drift, black-box logic—it’s all inherent to these systems, baked into the way they work. Their weaknesses emerge from the same properties that make them effective. We may never be able to root out these problems or understand where they come from, although we can anticipate and soften the impact when things go wrong. (We dedicate a whole chapter to defensive design in the Sentient Design book.)

Even if we may never know why these models do what they do, we can at least measure what they do. By observing how values are expressed dynamically and at scale, designers and researchers gain tools to spot gaps, drifts, or emerging risks early.

Measure, measure, measure. It’s not enough to declare values at launch and call it done. A strong defensive design practice monitors the system to make sure it’s following those values (and not introducing unanticipated ones, either). Ongoing measurement is part of the job for anyone designing or building an intelligent interface—not just the folks building foundation models. Be clear what your system is optimized to do, and make sure it’s actually doing it—and not introducing unwanted behaviors, values, or paperclip maximizers in the process.

Values in the Wild | Anthropic
design

Welcome To the Era of MEH

∞ Apr 21, 2025

Michal Malewicz explores what happens as AI gets better at core designer skills—not just visuals and words, but taste, experience, and research.

He points out that automation tends to devalue the stuff it creates—in both interest and attention. Execution, effort, and craft are what draw interest and create value, he says. Once the thing is machine-made, there’s a brief novelty of automation—and then emotional response falls flat: “The ‘niceness’ of the image is no longer celebrated. Everyone assumes AI made it for you, which makes them go ‘Meh’ as a result. Nobody cares anymore.”

As automated production approaches human quality, in other words, the human output gets devalued, too. As cheap, “good enough” illustration becomes widely available, “artisanal” illustration drops in value, too. Graphic designers are feeling that heat on their heels, and the market will likely shift, Michal writes:

We’ll see a further segmentation of the market. Lowest budget clients will try using AI to do stuff themselves. Mid-range agencies will use AI to deliver creatives faster and A LOT cheaper. It will become a quantity game if you want any serious cash. … And high-end, reputable agencies will still get expensive clients. They will use these tools too, but their experience will allow them to combine that with human, manual work when necessary. Their outputs will be much higher quality for a year or two. Maybe longer.

And what about UI/UX designers?

Right now the moat for most skilled designers is their experience, general UX heuristics (stuff we know), and research.

We’ve been feeding these AI models with heuristics for years now. They are getting much better at that part already. Many will also share their experience with the models to gain a temporary edge.

I wrote some really popular books, and chances are a lot of that knowledge will get into an LLM soon too.

They’ll upload everything they know, so they’ll be those “people using AI” people who replace people not using AI. Then AI will have both their knowledge and experience. This is inevitable and it’s stupid to fight it. I’m even doing this myself.

A lot of my knowledge is already in AI models. Some LLM’s even used pirated books without permission to train. Likely my books as well. See? That knowledge is on its way there.

The last thing left is research.

A big chunk of research is quantitative. Numbers and data points. A lot of that happens via various analytics tools in apps and websites. Some tools already parse that data for you using AI.

It’s only a matter of time.

AI will do research, then propose a design without you even needing to prompt.

This is all hard to predict, but this thinking feels true to the AI trend line we’ve all seen in the past couple of years: steady improvement across domains.

For argument’s sake, let’s assume AI will reach human levels in key design skills, devaluing and replacing most production work. Fear, skepticism, outrage, and denial are all absolutely reasonable responses to that scenario. But that’s also not the whole story.

At Big Medium, we focus less on the skills AI might replace, and more on the new experiences it makes possible. A brighter future emerges when you treat AI as a material for new experiences, rather than a tool for replacement. We’re helping organizations adopt this new design material—to weave intelligence into the interface itself. We’re discovering new design patterns in radically adaptive experiences and context-aware tools.

Our take: If AI is absorbing the taste, experience, and heuristics of all the design that’s come before, then the uniquely human opportunity is to develop what comes next—the next generation of all those things. Instead of using AI to eliminate design or designers, our Sentient Design practice explores how to elevate them by enabling new and more valuable kinds of digital experiences. What happens when you weave intelligence into the interface, instead of using it to churn out stuff?

Chasing efficencies is a race to the bottom. The smart money is on creating new, differentiated experiences—and a way forward.

Instead of grinding out more “productivity,” we focus on creating new value. That’s been exciting—not demoralizing—with wide-open opportunity for fresh effort, craft… and business value, too.

So right on: a focus on what AI takes or replaces is indeed an “era of meh.” But that’s not the whole story. We can honor what’s lost while moving toward the new stuff we can suddenly invent and create.

Welcome to the Era of MEH | Michal Malewicz
design

Redesigning Design, the Cliff in Front of Us All

∞ Apr 21, 2025

Greg Storey exhorts designers to jump gamely into the breach. Design process is leaner, budgets are tighter, and AI is everywhere. There’s no going back, he says—time for reinvention and for curiosity.

I don’t have to like it. Neither do you. But the writing is on the wall—and it’s constantly regenerating.

We’re not at a crossroads. We’re at the edge of a cliff. And I’m not the only one seeing it. Mike Davidson recently put it plainly: “the future favors the curious.” He’s right. This moment demands that designers experiment, explore, and stop waiting for someone else to define the role for them.

You don’t need a coach or a mentor for this moment. The career path is simple: jump, or stay behind. Rant and reminisce—or move forward. Look, people change careers all the time. There’s no shame in that. But experience tells me that no amount of pushback is going to fend off AI integration. It’s already here, and it’s targeting every workflow, everywhere, running on rinse-and-repeat.

Today’s headlines about AI bubbles and “regret” cycles feel familiar—like the ones we saw in the mid–90s. Back then, the pundits scoffed and swore the internet was a fad. …

So think of this moment not as a collapse—but a resize and reshaping. New tools and techniques. New outcomes and expectations. New definitions of value. Don’t compare today with yesterday. It doesn’t matter.

Redesigning Design, the Cliff in Front of Us All | Greg Storey
process

Design Artifacts

∞ Apr 21, 2025

Robin Rendle challenges designers to step back from rote process and instead consider what will help the end result. Journey maps, personas, wireframes, and the like—they’re only useful if they actually improve the experience that gets to customers. These are only thinking tools—a means to an end—yet they often get treated with the weight of the product itself:

So design artifacts are only useful if progress is made but often these assets lead nowhere and waste endless months investigating and talking with countless meetings in between.

There’s a factory-like production of the modern design process which believes that the assets are more important than the product itself. Bloated, bureaucratic organizations tend to like these assets because it absolves them of the difficulty of making tough decisions and shipping good design. They use these tools and documents and charts as an excuse not to fix things, to avoid the hard problems, to keep the status quo in check.

At Big Medium, we focus on keeping design artifacts light. At every stage, we ask ourselves: What do we need to know or share in order to move things forward? And what’s the smallest, lightest thing we can do to get there? Sometimes it’s just a conversation, not a massive PDF. Figure it out, sketch some things together, keep going.

As I wrote a few years ago, only one deliverable matters: the product that actually ships.

Even wth heavier work like research, we design the output to be light and lean—focused on next action rather than a completionist approach to showing all the data. The goal is not to underscore the work that we did; the point is what happens next. That means we design a lot of our artifacts as disposable thinking tools—facilitate the conversation, and then get on with it.

Alignment and good choices are important; that’s what process is for. But when process gets too heavy—when waystation documents soak up all the oxygen—you have a system that’s optimized to reduce risk, not to create something insightful, new, or timely.

Design Artifacts | Robin Rendle
chat

How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use

∞ Apr 21, 2025

A study by MIT Media Lab finds that heavy use of chatbots travels with loneliness, emotional dependence, and other negative social impacts.

Overall, higher daily usage–across all modalities and conversation types–correlated with higher loneliness, dependence, and problematic use, and lower socialization. Exploratory analyses revealed that those with stronger emotional attachment tendencies and higher trust in the AI chatbot tended to experience greater loneliness and emotional dependence, respectively.

Artificial personality has always been the third rail of interaction design—from potential Clippy-style annoyance to damaging attachments of AI companions. Thing is, people tend to assign personality to just about anything—and once something starts talking, it becomes nearly unavoidable to infer personality and even emotion. The more human something behaves, the more human our responses to it:

These findings underscore the complex interplay between chatbot design choices (e.g., voice expressiveness) and user behaviors (e.g., conversation content, usage frequency). We highlight the need for further research on whether chatbots’ ability to manage emotional content without fostering dependence or replacing human relationships benefits overall well-being.

Go carefully. Don’t assume that your AI-powered interface must be a chat interface. There are other ways for interfaces to have personality and presence without making them pretend to be human. (See our Sentient Scenes demo that changes style, mood, and behavior on demand.)

And if your interface does talk, be cautious and intentional about the emotional effect that choice may have on people—especially the most vulnerable.

How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Controlled Study | MIT Media Lab
algorithms

TikTok Will Never Die

∞ Jan 18, 2025

Damon Beres in the Atlantic Intelligence newsletter:

“Although it was not the first app to offer an endless feed, and it was certainly not the first to use algorithms to better understand and target its users, TikTok put these ingredients together like nothing else before it.” The app was so effective—so sticky—that every meaningful competitor tried to copy its formula. Now TikTok-like feeds have been integrated into Instagram, Facebook, Snapchat, YouTube, X, even LinkedIn.

Today, AI is frequently conflated with generative AI because of the way ChatGPT has captured the world’s imagination. But generative AI is still a largely speculative endeavor. The most widespread and influential AI programs are the less flashy ones quietly whirring away in your pocket, influencing culture, business, and (in this case) matters of national security in very real ways.

TikTok Will Never Die | The Atlantic
sentient design

MS Copilot Flying Straight Into the Mountain

∞ Jan 10, 2025

AI agents have lately captured the industry’s imagination (and marketing communications) in recent months. Agents work on their own; they set and pursue goals, make decisions about how to achieve them, and take action across multiple systems until they decide the goal is complete. Vaclav Vincalek ponders what happens when anyone can create and set these loose.

Now imagine that anyone in the organization will be able to create, connect, interact with a ‘constellation of agents.’

Perhaps you don’t see this as a problem.

That only means that you were never responsible for technology within your organization.

Maybe you had a glimpse in the news about all the latest threats from viruses, phishing or other various forms of hacking. Every IT department is trying to stay above water just to safely run what they have now.

These departments are managing networks, firewalls, desktops, laptops, people working remotely, integrating applications, running backups and updates.

The list is longer than you can imagine.

Thanks to Microsoft, you will add to the mix an ability for anyone in the company to automate any task to ‘orchestrate business processes ranging from lead generation, to sales order processing, to confirming order deliveries.’

What could possibly go wrong?

Look at the person sitting in the cubicle next to you (or in the next square on your Zoom call).

Would you trust the person with any work automation, or do you still question that person’s ability to differentiate between a left and right mouse click?

MS Copilot. Flying Straight into the Mountain | Vaclav Vincalek
sentient design

When Combinations of Humans and A.I. are Useful

∞ Nov 10, 2024

This study from MIT researchers raises some challenging questions about collaborative AI interfaces, “human in the loop” supervision, and the value of explaining AI logic and confidence.

Their meta-study looked at over 100 experiments of humans and AI working both separately and together to accomplish tasks. They found that some tasks benefited a ton from human-AI teamwork, while others got worse from the pairing.

Poor Performers Make Poor Supervisors

For tasks where humans working solo do worse than AI, the study found that putting humans in the loop to make final decisions actually delivers worse results. For example, in a task to detect fake reviews, AI working alone achieved 73% accuracy, while humans hit 55%—but the combined human-AI system landed at 69%, watering down what AI could do alone.

In these scenarios, people oscillate between over-reliance (“using suggestions as strong guidelines without seeking and processing more information”) and under-reliance (“ignoring suggestions because of adverse attitudes towards automation”).

Since the people were less accurate, in general, than the AI algorithms, they were also not good at deciding when to trust the algorithms and when to trust their own judgement, so their participation resulted in lower overall performance than for the AI algorithm alone.

Takeaway: “Human in the loop” may be an anti-pattern for certain tasks where AI is more high-performing. Measure results; don’t assume that human judgment always makes things better.

Explanations Didn’t Help

The study found that common design patterns like AI explanations and confidence scores showed no significant impact on performance for human-AI collaborative systems. “These factors have received much attention in recent years [but] do not impact the effectiveness of human-AI collaboration,” the study found.

Given our result that, on average across our 300+ effect sizes, they do not impact the effectiveness of human-AI collaboration, we think researchers may wish to de-emphasize this line of inquiry and instead shift focus to the significant and less researched moderators we identified: the baseline performance of the human and AI alone, the type of task they perform, and the division of labour between them.

Takeaway: Transparency doesn’t always engage the best human judgment; explanations and confidence scores need refinement—or an entirely new alternative. I suspect that changing the form, manner, or tone of these explanations could improve outcomes, but also: Are there different ways to better engage critical thinking and productive skepticism?

Creative Tasks FTW

The study found that human-AI collaboration was most effective for open-ended creative and generative tasks—but worse at decision-making tasks to choose between defined options. For those decision-making tasks, either humans or AI did better working alone.

We hypothesize that this advantage for creation tasks occurs because even when creation tasks require the use of creativity, knowledge or insight for which humans perform better, they often also involve substantial amounts of somewhat routine generation of additional content that AI can perform as well as or better than humans.

This is a great example of “let humans do what they do best, and let machines do what they do best.” They’re rarely the same thing. And creative/generative tasks tend to have elements of each, where humans excel at creative judgment, and the machines excel at production/execution.

Takeaway: Focus human-machine collaboration on creative and generative tasks; humans and AI may handle decision-making tasks better solo.

Divide and Conquer

A very small number of experiments in the study split tasks between human and machine intelligence based on respectie strengths. While only three of the 100+ experiments explored this approach, the researchers hypothesized that “better results might have been obtained if the experimenters had designed processes in which the AI systems did only the parts of the task for which they were clearly better than humans.” This suggests an opportunity for designers to explore more intentional division of labor in human-AI interfaces. Break out your journey maps, friends.

Takeaway: Divvy up and define tasks narrowly around the demonstrated strengths of both humans and machines, and make responsibilities clear for each.

When Combinations of Humans and AI are Useful | Nature
sentient design

This AI Pioneer Thinks AI Is Dumber Than a Cat

∞ Oct 14, 2024

Christopher Mims of the Wall Street Journal profiles Yann LeCun, AI pioneer and senior researcher at Meta. As you’d expect, LeCun is a big believer in machine intelligence—but has no illusions about the limitations of the current crop of generative AI models. Their talent for language distracts us from their shortcomings:

Today’s models are really just predicting the next word in a text, he says. But they’re so good at this that they fool us. And because of their enormous memory capacity, they can seem to be reasoning, when in fact they’re merely regurgitating information they’ve already been trained on.

“We are used to the idea that people or entities that can express themselves, or manipulate language, are smart—but that’s not true,” says LeCun. “You can manipulate language and not be smart, and that’s basically what LLMs are demonstrating.”

As I’m fond of saying, these are not answer machines, they’re dream machines: “When you ask generative AI for an answer, it’s not giving you the answer; it knows only how to give you something that looks like an answer.”

LLMs are fact-challenged and reasoning-incapable. But they are fantastic at language and communication. Instead of relying on them to give answers, the best bet is to rely on them to drive interfaces and interactions. Treat machine-generated results as signals, not facts. Communicate with them as interpreters, not truth-tellers.

This AI Pioneer Thinks AI Is Dumber Than a Cat | WSJ
ai

Beware of Botshit

∞ Oct 13, 2024

botshit noun: hallucinated chatbot content that is uncritically used by a human for communication and decision-making tasks. “The company withdrew the whitepaper due to excessive botshit, after the authors relied on unverified machine-generated research summaries.”

From this academic paper on managing the risks of using generated content to perform tasks:

Generative chatbots do this work by ‘predicting’ responses rather than ‘knowing’ the meaning of their responses. This means chatbots can produce coherent sounding but inaccurate or fabricated content, referred to as ‘hallucinations’. When humans use this untruthful content for tasks, it becomes what we call ‘botshit’.

See also: slop.

Beware of Botshit: How to Manage the Epistemic Risks of Generative Chatbots
  • ◀︎
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • ▶︎
Big MediumBig Medium logo Back to top
Skip Navigation
  • Ideas
  • Projects
  • Talks
  • About
  • Hire Us

Read us

Book cover of Sentient Design by Josh Clark with Veronika Kindred

Work with us

  • Interface and experience design
  • Sentient Design and AI
  • Digital strategy and process
  • Design systems
  • Production and co-creation
  • Action plan
  • Coaching and hands-on advice
  • Workshops and talks

Follow us

Get the newsletter

    • Twitter
    • RSS
    • Instagram
    • Github

    Contact us

    Start with Josh Clark

    josh@bigmedium.com
    (401) 339-3381

    Big Medium is a Global Moxie company.
    Copyright 2003–2026 Global Moxie, LLC. All rights reserved.