Public webpage for sharing information about Dr. Joyner's CS6750 - Human Computer Interaction course in Spring 2022.
[MUSIC] For this portion of our conversation about human computer interaction, we’re going to talk about some established principals that we’d uncovered after decades of designing user interfaces. We want to understand the fundamental building blocks of HCI, and separately we’ll talk about how to build on those foundations to do new research and new development. To get started, though, let’s first define some of the overarching ideas of design principles. In this lesson, we’re going to talk about the way we focus on users and tasks in HCI, not on tools and interfaces on their own. We’re going to talk about the role of the interface and how it mediates between user and the task. We’re going to discuss different views on the user’s role in the system. And we’re going to talk about user experience more generally and how it exists at several different levels. Along the way, we’ll tackle some design challenges, reflect on our own experiences, and try to apply what we learn to the broader field of HCI.
At the heart of Human Computer Interaction is the idea that users use interfaces to accomplish some task. In general, that interface wouldn’t actually have to be technological. This cycle exists for things like using pencils to write things or using a steering wheel to drive a car. But in HCI, we’re going to focus on interfaces that are in some way computational or computerized. What’s most important here though is our focus on the interaction between the user and the task though the interface, not just the interaction between the user and the interface itself. We’re designing interfaces, sure, but to design a good interface, we need to understand both the users goals and the tasks they’re trying to accomplish. Understanding the task is really important. One of the mistakes many novice designers make, is jumping too quickly to the interface, without understanding the task. For example, think about designing a new thermostat. If you focus on the interface, the thermostat itself, you’re going to focus on things like the placement of the buttons or the layout of the screen, on whether or not the user can actually read what’s there, and things like that. And those are all important questions. But the task is controlling the temperature in an area. When you think about the task rather than just the interface, you think of things like nest, which is a device that tries to learn from its user and act autonomously. That’s more than just an interface for controlling whether the heat or the air conditioning is on. That’s an interface for controlling the temperature in your house. By focusing on the task instead of just the interface, we can come up with more revolutionary designs like the Nest rather than just iterative improvements to the same thermostats we’ve been using for years.
Let’s try identifying a task real quick. We’re going to watch a short clip of Morgan. Watch what she does, and try to identify what task she is performing. [MUSIC] What was the task in that clip?
If you said she’s swiping her credit card, you’re thinking a little too narrowly. Swiping her credit card is just how she accomplishes her task. We’re interested in something more like she’s completing a purchase. She’s purchasing an item. She’s exchanging goods. Those all put more emphasis on the actual task she’s accomplishing and let us think more generally about how we can make that interface even better.
Here are five quick tips for identifying a user task. Number one: watch real users. Instead of just speculating or brainstorming, get out there and watch real users performing in the area in which you’re interested. Number two: talk to them. You don’t have to just watch them. Recruit some participants to come perform the task and talk their way through it. Find out what they’re thinking, what their goals are, what their motives are. Number three: start small. Start by looking at the individual little interactions. It’s tempting to come in believing you already understand the task, but if you do, you’ll interpret everything you see in terms of what you already believe. Instead, start by looking at the smallest operators a user performs. Number four: abstract up. Working from those small observations try to abstract up to an understanding of the tasks that they’re trying to complete. Keep asking why they’re performing these actions until you get beyond the scope of your design. For example, what is Morgan doing? Swiping her credit card. Why? To make a purchase. Why? To acquire some goods. Why? To repair her car. Somewhere in that sequence is likely the task for which we want to design. Number five: you are not your user. Even if you yourself perform the task for which you’re designing, you’re not designing for you. You’re designing for everyone that performs that task. So, leave behind your own previous experiences and preconceived notions about it. These five quick tips come up a lot in the methods unit of HCI. HCI research methods are largely about understanding users, their motivations, and their tasks. So we’ll talk much more about this later but it’s good to keep in mind now.
The ultimate goal of design in HCI is to create interfaces that are both useful and useable. Useful means that the interface allows the user to achieve some task, but usefulness is a pretty low bar. For example, a map is useful in finding your way from one place to another, but it isn’t the most useable thing in the world. You have to keep track of where you are, you have to plot your own route. And you have to do all of this while driving the car. So before GPS navigation, people would often manually write down the turns before they actually started driving somewhere that they hadn’t been before. So our big concern is usability. That’s where we get things like navigation apps. Notice how we have to focus on understanding the task when we’re performing design. If we set out to design a better map. We probably wouldn’t have ended up with a navigation app. It was through understanding the task of navigation itself that we realized we could offload a lot of the cognitive load of navigation onto the interface, closing the loop between the user and the task of navigation.
Throughout this unit, I repeatedly ask you to revisit the area of HCI you chose to keep in mind throughout our conversations. Now, take a second, and try to pull all those things together. You’ve thought about how your chosen area applies to each of the models of the humans role. How it applies to various different design guidelines, and how it interacts with society and culture as a whole. How does moving through those different levels changed the kinds of designs you have in mind. Are you building up from low-level interactions to high-level effects? Are you starting at the top with the desired outcome and working your way down to the individual operations? There are no right or wrong answers here. The important thing is reflecting on your own reasoning process.
In looking at human-computer interaction, it’s important that we understand the role that we expect the human to play in this overall system. Let’s talk about three different possible types of roles the human can play, processor, predictor, and participant. First, we might think of the human as being nothing more than a sensory processor. They take input in and they spit output out. They’re kind of like another computer in the system, just one that we can’t see the inside of. If we are designing with this role in mind then our main concern is that the interface fit within known human limits. These are things like what humans can sense, what they can store in memory, and what they can physically do in the world. In this case, usability means that the interface is physically usable. User can see all the colors, touch all the buttons, and so on. With this model, we evaluate our interfaces with quantitative experiments. That means we take numeric measurements on how quickly the user can complete some task or how quickly they might react to some incoming stimulus. Now, as you might have guessed, the processor view is not the one we’ll generally take when we talk about good design. Instead, we’ll probably divide our time pretty equally between the other two perspectives.
A second way of viewing the human is to view them as a predictor. Here, we care deeply about the human’s knowledge, experience, expectations, and their thought process. That’s why we call them the predictor. We want them to be able to predict what will happen in the world as a result of some action they take. So we want them to be able to map input to output. And that means getting inside their head. Understanding what they’re thinking, what they’re seeing, what they’re feeling when they’re interacting with some task. If we’re taking this perspective, then the interface must fit with what humans know. It must fit with their knowledge. It must help the user learn what they don’t already know and efficiently leverage what they do already know. And toward that end, we evaluate these kind of interfaces with qualitative studies. These are often ex situ studies. We might perform task analyses to see where users are spending their time. Or perform cognitive walk-throughs to understand the user’s thought process throughout some task. We can see pretty clearly that this view gives us some advantages over viewing the user simply as a sensory processor, just as another computer in the system. However, here we’re still focusing on one user and one task. And sometimes that’s useful. But many times we want to look even more broadly than that. That’s when we take the third participant peel.
A third view on the user is to look at the user as a participant in some environment. That means we’re not just interested in what’s going on inside their head. We’re also interested in what’s going on around them at the same time, like what other tasks or interfaces they’re using, or what other people they’re interacting with. We want to understand for example, what’s competing for their attention? What are their available cognitive resources? What’s the importance of the task relative to everything else that’s going on? So if we take this view, then our interface must fit with the context. It’s not enough that the user is able to physically use the system and knows how to use the system. They must be able to actually interact with the system in the context where they need it. And because context is so important here, we evaluate it with in situ studies. We can’t simply look at the user and the interface in a vacuum. We have to actually view and evaluate them in the real world using the interface in whatever context is most relevant. If we’re evaluating a new GPS application, for example, we need to actually go out and look at it in the context of real drivers driving on real roads. The information we get from them using the app in our lab setting isn’t as useful as understanding how they’re going to actually use it out in the real world. These are in situ studies, which are studies of the interface and the user within the real complete context of the task.
So, where did these different views of the user come from? To find out, we actually need to trace HCI back to its roots in psychology. The processor view of the user goes back to the behaviorist school of psychology. Behaviorism was the dominant school of thought in psychology throughout the late 19th century. It aimed to provide a systematic way of investigating behaviors in humans and in other animals. It was largely established by John B. Watson who was a psychologist at Columbia University. He insisted that psychology focus only on observable behavior and not on introspection. John B. Watson himself was responsible for the “Little Albert” experiment that you might be familiar with. In that experiment, a little boy was conditioned to be afraid of rabbits as they were repeatedly paired with loud noises. You might be familiar with some of the other big names in behaviorism as well, such as Pavlov with his dogs and Skinner with his rats. Pavlov discovered the idea of classical conditioning where he could condition a dog to salivate at the sound of a bell by repeatedly pairing those two stimuli together. Skinner went step further with operant conditioning and conditioned rats to actually demonstrate a learned behavior. They would press a button to receive food. The important takeaway here is that Behaviorism focuses only on observable behaviors and outcomes. It attempts to understand behavior by looking only at behavior, not at the cognition that underlies behavior, and thus it has the name, “Behaviorism.” In HCI, this means looking at what designs create the right behaviors without paying a whole lot of attention as to why. If we view the user as a processor, that our design process focuses on testing observable behaviors. The predictor view of the user goes back to the next major school of thought in psychology, Cognitivism, or Cognitive psychology or Cognitive science. While Behaviorism was only concerned with what we could observe, Cognitivism is concerned with what goes on inside the mind. Now, this covers a lot, things like perception, attention, memory, creativity, these all occur inside the user’s mind. Cognitivism started out more as a philosophical endeavor. Philosophers like Rene Descartes and Immanuel Kant touched on questions like whether knowledge is inborn or developed solely by experience, whether or not people are blank slates when they’re born or if they’re born with some kind of innate knowledge. But it wasn’t until the cognitive revolution of the 1950s that Cognitivism really emerged as a foil to Behaviorism. At that time, it was such a radical departure than in many places, you still see Cognitive science classified as a completely different field from psychology. What’s particularly interesting is that this shift came about in large part due to the work of scientists working in artificial intelligence and computer science, as well as neuroscience and linguistics and some other areas that were applicable to psychology but weren’t inside or weren’t subsets of psychology. Some of the big names in Cognitivism are the linguist, Noam Chomsky, the psychologist, Susan Carey, and the computer scientists, John McCarthy, Marvin Minsky, Allen Newell, and Herbert Simon. Many of these might be names you’re familiar with if you’ve read about classical AI. These are some of the biggest names in classical AI thinking. Those early efforts in classical AI were specifically trying to create computers that could think like humans. In order to create a computer that could think like a human, we had to understand how a human thought. Now, for us here in HCI, the key here is that we care about what the user is thinking. We call this the predictor model of the user because predicting is a mental process, and we want to know what the user is predicting. We ask questions like, what do they predict the outcome of that action will be? What do they predict is the right action to take? Notice here, the user is doing the predicting, not the interface. We want to get inside the user’s head and understand how they predict the interface will behave. That’s the core of the predictor model. Now, while Behaviorism and Cognitivism are well-established schools of thought in psychology, the participant view is a little more nascent. It resembles the functionalism view of psychology, which emphasizes examining mental behaviors in the context of broader environments. It also resembles system psychology which emphasizes human behavior within complex systems. But in both places, the coupling between the participant view in these schools of thought is little less well-defined. In our work, the participant view largely comes out of original research in HCI and human factors engineering. The important thing here is that while the processor and predictor views emphasize only the interaction of some user and some interface, the participant view also looks at the interaction of both within the context of a larger system. It cares about the environment in which the user and the interface are situated. So, this model views the user and the interface as participants within a larger complex cognitive system. Some of the major names in this area are Edwin Hutchins, who pioneered the idea of distributed cognition. We’ll talk about him more when we talk about distributed cognition in a few lessons. Lucy Suchman also introduced the idea of situated action models, which argue that we can’t disentangle behavior from the environment in which it takes place. Gavriel Salomon was an educational psychologist who focused in large part on how learning happens in the context of culturally provided tools and implements, and Bonnie Nardi introduced the complimentary idea of activity theory into this general idea of analyzing a user and an interface as participants in some larger activity. So, anytime you’re looking beyond just the user and the interface, you’re likely employing the participant view in some way. But like I said, we’ll talk about this a lot more when we reach our lesson on distributed cognition. Now, this has been a super quick recap of these three major schools of thought in both psychology and in HCI. There are actually entire classes available to focus exclusively on each of these or even on subparts of each of these. So, if you’d like to learn more, you should check out some of the links that we’ll provide below.
To better understand how we might use these three views of the user, let’s see how we might apply them to a design challenge. So, here we have the address entry screen on Tesla’s Model S. At the top, we have the text box that the user is entering text into. Below, we have some results. At the bottom, we have the keyboard that they actually use to enter their text. Let’s imagine we’re trying to redesign it such that the user can enter the address of their target destination more quickly so that they can get on with navigation. With the processor model, we’re strictly looking at the user’s observable behavior. So, we might construct a controlled study where we bring participants in, give them addresses to enter and different interfaces to use, and time them on different versions. Whichever interface has the fastest times would be the interface we might want to go with. There are some benefits to using this model. One big one is that we might actually be able to use existing data for this. If we assume that every time a user brings up the search bar, they’re going to input an address and navigate to, we can look back and look how long does it usually take from opening that search bar to access starting navigation. We might be able to do that on an absolutely massive quantity of data. Another benefit of this is it enables objective comparisons. That means we can compare this text entry screen to a voice system or some other way of inputting addresses to understand how different interfaces entirely or different modes of interaction entirely can have different efficiencies associated with them. Most importantly, those comparisons are objective. There’s no real interpretation involved in saying that it takes an average of 5.2 seconds to go from entering an address to starting navigation. That’s just a descriptive statistic. However, there are some pretty major drawbacks as well. When we’re employing the processor model, we don’t see the reason for the differences that we observe. We have no real basis to understand why one interface performs better than the other. We also can’t differentiate by expertise. Usually, when we’re using the processor model, we’re working with expert users, and we’re not really worried about what they’re thinking about. For novices though, it’s very difficult for the processor model to understand what a novice finds confusing or misleading. Most generally, the processor model is usually good for optimization but not for redesign. If we’re making small changes like the size of buttons or the responsiveness of the screen, then it can help us compare and understand which interface is performing better. But if we’re going to try a comprehensive new redesign, the processor model isn’t very helpful. It might be good for evaluating that new redesign once we’ve created it, but it doesn’t give us very much input on what we should include in that redesign. If we shift to the predictor model, then we’re going to actually start asking our users for input. We could bring them in for interviews, conduct focus groups, send out surveys. We can also show them prototypes for new interfaces and have them describe their thought process while trying to interact with them. We might find some simple changes that we wouldn’t have stumbled upon otherwise. For example, we might find that users find a certain icon to be misleading compared to its real meaning. We might also find some information about why users choose different interfaces at different times. For example, users might prefer the voice interface while driving but this text interface while they’re parked. So, the big advantage here is that we get a more complete picture of the user’s interaction with the interface. We get to ask them why they do certain things, or what they’re thinking about, or why they made certain choices. Additionally, this model lets us target different levels of expertise. We can bring in novices who’ve never seen the interface before and say, “Take a look at this. What do you think you should do next?” But we could also bring in experts and have them reflect on how some new interface might make it more or less efficient to accomplish what they want to accomplish. But again, there are drawbacks. One big drawback is that the analysis of this can be very expensive. We’re not looking at numbers and conducting a little statistical tests. We’re often looking at plain text transcripts of interviews or plain text responses to surveys. Analyzing those requires a lot of human attention and a lot of effort. Even then, that analysis can be subject to biases. If the person analyzing that data has some suspicions about what they think the better interface would be, they’re very likely to imbue their analysis with those biases and only focus on data that confirms what they already think was true. So, we have to make sure that they control for those biases. Additionally, when we’re using the predictor model, we’re still usually ignoring the broader interaction context. We’re usually looking at the person in the interface, but not the real authentic environment in which they usually use that interface. For this interface, that could be a problem because we might have an interface that works very well when it has the full user attention. But it’s significantly harder to use when the user is, for instance, driving. For example, if we were only testing a new interface in a lab setting, we might have a feature that hides what the user has been searching for if they haven’t entered any new text or made a selection in the past five seconds. In a lab setting, that’s probably fine because our users are generally only focused on the task that we’re giving them, which is entering some address into that search bar. But if they’re actually out driving, they might start entering an address at one red light, had light turn green and want to pause, and continue entering it at the next red light. If it disappears after five seconds, that’s not really aware of the way they’re using it in the real environment. So, that’s where the participant model comes in. With a participant model, we view the interface and the user in the context in which they actually interact. We want to look at the user in the interface as participants in some broader activity, the broader activity of driving, not just as a user interacting with one interface. Now, the benefit of this should be pretty obvious. It evaluates the interaction in context. We can understand things like the driver is distracted or they’re going to do things in different batches as it come to different red lights. We also capture an authentic representation of the user’s level of attention. If the user is going to be distracted, for example, we’ll see that when we analyze the activity and the authentic contexts in which it takes place. But unsurprisingly, there are also drawbacks to this approach to. Just as the predictor model suggested evaluations that were difficult to analyze, the participant model emphasizes evaluations that can actually be difficult to perform. To evaluate this interface in the authentic contexts in which it’s used, we need to actually go right along with participants. That’s a lot harder to do than just sending out a survey or bring people into our controlled lab environment. It also means we need to have real functional interfaces. We can’t have a person driving a car and keep holding up some prototype next to them while they’re actually trying to drive. We need these interfaces to actually be designed and implemented to work in that real context. So, it’s hard to use this model when we’re just getting started with a new design task. Finally, using the participant model means that we’re exposing ourselves to a lot of uncontrollable variables. The more that’s going on in the environment, the harder it is to zoom in and focus only on the impact of our designs. Now, hopefully you’ll notice that the pros of some of these models address to cons of others. The processor model, for example, doesn’t give us much insight into what novices are thinking. But the predictor model is particularly good at targeting novices. To predict a model makes it difficult to conduct real objective comparisons. That’s exactly what the processor model is good for. Neither the processor nor the predictor model take into consideration context, but that’s what we get when we consider the participant model. But it doesn’t isolate variables very well. It’s subject to interference from things that we can’t control. But the processor model is very good at isolating those variables. So the major takeaway here is it will likely use all of these different models at different times and in different contexts. The data we gather from one might inform what we do with another. We might start with a participant model or we just ride around with users watching what they do. Based on that, we might observe that they spend a lot of time fumbling around a return to the same few locations. So, we might redesign an interface to include some kind of bookmarking system and present it to users in interviews. There, they might tell us they like that design, but further note, they don’t need a long list of bookmarks. They really only need work and home. So based on that, we might then design an interface where a simple swipe takes him to work or to home depending on where they are right now. In fact, that’s how that Tesla navigation screen works. If you just swipe across to navigate button, it’ll automatically take you to work if you’re at home, or home if you’re at work. Then, finally, we might test it with the processor model to see just how much more efficiently that new interface allows users to put in their destination. The results of each design phase inform the next, and different phases call for different types of evaluation, which echoed the different models of the user.
[MUSIC] Good design, a GPS system that warns you 20 seconds before you need to make a turn. » In 1,000 feet, turn left. » Bad design, a GPS system that warns you two seconds before you need to make a turn. » Turn left now, hurry. [SOUND] » It sounds funny, but which view you take on the user can have a huge impact on the success of the interface. If you view the user just as a sensory processor, you might think that we only need to alert them a second before the upcoming turn because, after all, human reaction time is less than a second. If you view the user as a predictor, you understand they need time to slow the car down and actually make the turn. So they might need a few more seconds to execute the action of turning before being alerted they need to turn. And if you view the user as a participant, you’ll understand this is happening while they’re going 50 miles down the road with a screaming toddler in the backseat, trying to merge with the driver on a cell phone and the other one eating a cheeseburger. So it would probably be a good idea to give them a few or more reminders before the turn and plenty of time to get in the right position.
Let’s take a moment to reflect on when you’ve encountered these different views of the user in your own history of interacting with computers. Try to think of a time when a program, an app or a device clearly treated you as each of these types of users for better or for worse.
For me, we have a system at Udacity we use to record hours for those of us that work on some contract projects. It asks us to enter the number of hours of the day we spend on each of a number of different types of work. The problem is that, that assumes something closely resembling the processor model. A computer can easily track how long different processes take. But for me, checking the amount of time spent on different tasks can be basically impossible. Checking my e-mails involves switching between five different tasks a minute. How am I suppose to track that? The system doesn’t take into consideration a realistic view of my role in the system. Something more similar to the predictor view would be, well, the classroom you’re viewing this in. Surrounding this video are a visual organization of the lesson’s content, a meter measuring your progress through the video, representations of the video’s transcript. These are all meant to equip you with the knowledge to predict what’s coming next. This classroom takes a predictor view of the user. It offloads some of the cognitive load onto the interface allowing you to focus on the material. For the third view I personally would consider my alarm clock an example. I use an alarm clock app called Sleep. It monitors my sleep cycles, rings at the optimal time and tracks my sleep patterns to make recommendations. It understand its role as part of a broader system needed to help me sleep. It goes far beyond just interaction between me and an interface. It integrates into the entire system.
By my definition, user experience design is attempting to create systems that dictate how the user will experience them. Preferably that the user will experience them positively. User experience in general, though, is a phenomenon that emerges out of the interactions between humans and tasks via interfaces. We might attempt to design that experience. But whether we design it or not, there is a user experience. It’s kind of like the weather, there’s never no weather, there’s never no user experience. It might be a bad experience if we don’t design it very well. But there’s always some user experience going on and it emerges as a result of the human’s interactions with the task via the interface. But user experience also goes beyond this simple interaction. It touches on the emotional, personal, and more experiential elements of the relationship. We can build this by expanding our understanding of the scope of the user experience. For just a particular individual, this is based on things like the individual’s age, sex, or race, personal experiences, gender, expectations for the interface, and more. It goes beyond just designing an interface to help with a task. It touches on whether the individual feels like the interface was designed for them. It examines whether they’re frustrated by the interface or joyous about it. Those are all parts of this user experience. We can take this further and talk about user experience at a group level. We can start to think about how interfaces lead to different user experiences among social or work groups. For example, I’ve known that school reunions seem to be much less important to people who’ve graduated within the past 15 years. And I hypothesize it’s because Facebook and email have played such significant roles in keeping people in touch. It’s fundamentally changed the social to group user experience. Those effects can then scope all the way up to the societal level. Sometimes these are unintended. For example, I doubt that the creators of Twitter, foresaw when they created their tool, how it would play a significant role in big societal changes like the Arab spring or, sometimes these might be intentional. For example, it was a significant change when Facebook added new relationship statuses to its profiles to reflect things like civil unions. That simultaneously reflected something that was already changing at the societal level. But it also participated in that change and helped normalize those kinds of relationships. And that then relates back to the individual by making sure the interface is designed such that each individual feels like it’s actually designed with them in mind. The options are there for them to feel like they’re properly represented within the system. These are all components of the general user experience that we need to think about as we design interfaces.
So keeping in mind everything we’ve talked about, let’s design something for Morgan. Morgan walks to work, she likes to listen to audiobooks, mostly non fiction. But she doesn’t just want to listen, she wants to be able to take notes and leave bookmarks. And do everything else you do when you’re reading. What would designing for her look like, from the perspectives of viewing her as a processor, a predictor, and a participant? How much this different designs affect user experience as an individual in her local group of friends. And the society as a whole if the design caught on.
As a processor, we might simply look at what information is communicated to Morgan, when, and how. As a predictor, we might look instead at how the interface meshes with Morgan’s needs with regard to this task, how easy it is to access, how easy the commands are to perform, and so on. As a participant, we might look at the broader interactions between this interface and Morgan’s other tasks and social activities. You might look at how increased access to books changes her life in other ways. But really, this challenge is too big to address this quickly. So instead, let’s return to this challenge throughout our conversations, and use it as a running dialogue to explore HCI principles and methods.
In this lesson, we’ve covered some of the basic things you need to understand before we start talking about design principles. We covered the idea that interfaces mediate between users and tasks, and the best interfaces are those that let the user spend as much time thinking about the task as possible. We covered the idea of usability and how you have to keep in mind the efficiency and user satisfaction of the interface. We covered three views of the user and how those different views affect how we define usability and evaluation. We covered how the user experience does not exist just at the user level but also at group and even societal levels.
[NOISE] Feedback cycles are the way in which people interact with the world, and then get feedback on the results of those interactions. We’ll talk about the ubiquity of those feedback cycles. Then we’ll talk about the gulf of execution, which is the distance between a user’s goals and the execution of the actions required to realize those goals. Then we’ll talk about the Gulf of evaluation, which is the distance between the effects of those actions and the user’s understanding of those results. We’ll discuss seven questions we should ask ourselves when designing feedback cycles for users and we’ll also look at applications of these in multiple areas of our everyday lives.
Feedback cycles are incredibly ubiquitous, whether or not there’s a computational interface involved. Everything from reading to driving a car to interacting with other people could be an example of a feedback cycle in action. They’re how we learn everything, from how to walk to how to solve a Rubik’s cube to how to take the third order partial derivative of a function. I assume, I’ve never done that. We do something, we see the result, and we adjust what we do the next time accordingly. You may have even seen other examples of this before, too. If you’ve taken Ashok’s and mine knowledge-based AI class, we talk about how agents are constantly interacting with, learning from, and affecting the world around them. That’s a feedback cycle. If you’ve taken cyber physical systems course, you’ve seen this this without human involved at all, as a system can autonomously read input and react accordingly. Under some definitions, some people would even call this the artificial intelligence, specifically because it mimics what a human actually does. They act in the world and they evaluate the result. In fact, if you look at some of the definitions of intelligence out there, you’ll find that many people actually define feedback cycles as the hallmark of intelligent behavior. Or they might define intelligence as abilities that must be gained through feedback cycles. Colvin’s definition, for example, involves adjusting to one’s environment, which means acting in it and then evaluating the results. Dearborn’s definition of learning or profiting by experience is exactly this as well. You do something and experience the results, and learn from it. Adaptive behavior in general can be considered an example of a feedback cycle. Behavior means acting in the world. And adapting means processing the results and changing your behavior accordingly. And most generally, Schank’s definition is clearly an ability gained through feedback cycles, getting better over time based on evaluation of the results of one’s actions in the world. And Schank’s general definition, getting better over time, is clearly something that can happen as a result of participation in a feedback cycle. We find that nearly all of HCI can be interpreted in some ways as an application of feedback cycles, whether between a person and a task, a person and an interface, or systems comprised of multiple people and multiple interfaces.
In our feedback cycle diagram, we have on the left, some user and on the right, some task or system. The user puts some input into the system through the interface and the system communicates some output back to the user again through the interface. Incumbent on this are two general challenges, the user’s interaction with the task through the interface and the task’s return to the user of the output via the interface. The first is called the Gulf of execution. The Gulf of execution can be defined as how do I know what I can do. The user has some goals. How do they figure out how to make those goals a reality? How do they figure out what actions to take to make the state of the system match their goal state? This is the Gulf of execution. How hard is it to do in the interface what is necessary to accomplish the users’ goals? Or alternatively, what’s the difference between what the user thinks they should have to do and what they actually have to do. Now there are a number of components of this. The first component, they need to be able to identify what their goal is in the context of the system. There might be a mismatch between their own understanding and the system’s structure. Think of transitioning from an old-fashioned VCR to a more modern DVR or from a DVR to watching things on-demand. The user needs to think of their goal in terms of their current system. Second, they need to be able to identify the actions necessary to accomplish their goals. Now that they know what their goal is in the context of the system, they need to identify the actions that it will take to make that goal a reality. And third, once I’ve identified those actions, they need to actually execute the actions within the interface. Again, imagine someone who’s learning to use an on demand video interface, when they’re used to using things like VCRs and DVRs. Their goal hasn’t changed. They want to watch some program that’s already aired. But in the context of a VCR or a DVR, their intention might be to record that program. In the context of an on demand video interface, their intentions instead are to call up the existing version of that program. That’s a mismatch between what they think their goal is and what their goal is in the context of this new system. But once they understand what the goal means in their current system, they now need to know how to pull up that program. They need to know how to navigate the menus and find the program that they want to watch and then start it playing. And then once they know what to do, they need to actually execute that series of button presses. For example, they might know what actions to perform but they might not know where to find them. That would present a difficulty in executing those actions. So the gulf of execution takes the user from understanding their own goals to understanding their goals in the context of the system, to understanding the actions necessary to realize those goals, to actually executing those actions. And each of these presents some difficulties.
Let’s take a simple example of the gulf of execution. I’m making my lunch, I have my bowl of chili in the microwave. My goal is simple, I want to heat it up. How hard is that? Well, typically when I’ve been cooking in the past, cooking is defined in terms of the amount of time it takes. So, in the context of this system, I specify my intent as to microwave it for one minute. Now what are the actions necessary to do so? I press Time Cook to enter the time-cooking mode, I enter the time, one minute, and I press Start. I didn’t press Start just now, but I would press Start. I specified my intent, microwave for one minute. I specified my actions, pressing the right sequence of buttons, and I executed those actions. Could we make this better? There were a lot of button presses to microwave for just one minute. If we think that’s a common behavior, we might be able to make it simpler. Instead of pressing Time Cook one, zero, zero and Start, I might just press one and wait. Watch. [NOISE] So I’ve narrowed the gulf of execution by shrinking the number of actions required, but I may have enlarged it by making it more difficult to identify the actions required. When I look at the microwave, Time Cook gives me an idea of what that button does. So if I’m a novice at this, I can discover how to accomplish my goal. That’s good for the gulf of execution. It’s easier to look at the button and figure out what to do than to have to go look, read a manual, or anything like that and find out on your own. But once you know that all you have to do is press one, that’s much easier to execute. That’s something nice about this interface, it caters to both novices and experts, there’s a hard and discoverable way and a short and visible way. But let’s rewind all the way back to the goal I set up initially, my goal was to heat up my chili. I specified my intent in terms of the system as microwaving it for one minute. But was that the right thing to do? After one minute, my chili might not be hot enough, this microwave actually has an automatic reheat function that senses the food’s temperature and stops when the time seems right. So the best bridge over the gulf of execution might also involve helping me reframe my intention. Instead of going to microwave for one minute, it might encourage me to reframe this as simply heating until ready and letting the microwave do the rest.
Here are five quick tips for bridging gulfs of execution. Number one, make functions discoverable. Imagine a user is sitting in front of your interface for the very first time. How would they know what they can do? Do they have to read the documentation, take a class? Ideally, the functions of the interface will be discoverable. Meaning that they can find them clearly labeled within the interface. Number two, let the user mess around. You want your user to poke around and discover things, make them feel safe in doing so. Don’t include any actions that can’t be undone. Avoid any buttons that can irreversibly ruin their document or set up. That way the user will feel safe discovering things in your interface. Number three, be consistent with other tools. We all want to try new things and innovate, but we can bridge gulf of execution nicely, by adopting the same standards that many other tools use. Use ctrl c for copy and ctrl v for paste. Use a diskette icon for save even though no one actually uses floppy disks anymore. This makes it easy for users to figure out what to do in your interface. Number four, know your user. The Gulf of Execution has a number of components: Identifying intentions, identifying the actions to take, and taking the actions. For novice users, identifying their intentions and actions are most valuable. So, making commands discoverable through things like menus is preferable. For experts though, actually doing the action is more valuable. That’s why many experts prefer the command line, although it lacks many usability principles targeted at novices, it’s very efficient. Number five, feedforward. We’ve talked about feedback, which is a response to something that the user did. Feedforward is more like feedback on what the user might want to do. It helps the user predict what the result of an action will be. For example, when you pull down the Facebook news feed on your phone, it starts to show the little refresh icon. If you don’t finish pulling down, it doesn’t refresh. That’s feedforward. Information on what will happen if you keep doing what you’re doing. Many of these tips are derived from some of the fundamental principles of design, pioneered by people like Don Norman and Jacob Nielsen, and we’ll cover them more in another lesson.
The second challenge is for the task to express to the user through the interface, the output of the actions that the user took. This is called the Gulf of Evaluation because the user needs to evaluate the new state of the system in response to the actions they took. Like the Gulf of Execution, we can think of this in terms of three parts. There’s the actual physical form of the output from the interface, what did it actually do in response? There might be something visual, there might be a sound, a vibration, some kind of output. The second is interpretation. Can the user interpret the real meaning of that output. You might think of this in terms of a smartphone. If a smart phone vibrates in your pocket, can you interpret what the meaning of that output was or do you have to pull the phone out and actually see? Then the third phase is evaluation. Can the user use that interpretation to evaluate whether or not their goals were accomplished? You can imagine submitting a form online. It might give you output that you interpret to mean that the form was received but you might not be able to evaluate whether or not the form was actually accepted. Once they’ve received and interpreted that output, the final step is to evaluate whether or not that interpretation means that their goals were actually realized within the system. Take our on-demand video service example again. Imagine that the user has gotten all the way to finding the program that they want to watch and they’ve press the play button on the remote. Imagine the interface responds by hiding the menus that they were using to navigate amongst the service. Can they interpret the meaning of that output? Can they evaluate whether or not that interpretation means that their goals were realized? If they’re a novice user maybe not, an expert might correctly interpreted the screen blacking out is because the service is trying to load the video, they then evaluate that interpretation and determined that their goals have been realized, but the service is trying to play the show they want to watch. But a novice user might interpret that output to mean that the service has stopped working at all, like when your computer just shuts down and the screen goes black. They then incorrectly evaluate that their goals were not actually realized in the system. We might get over this by showing some kind of buffering icon. That’s a different kind of output from the system that helps the user correctly interpret that the system is still working on the actions that they put in. They then can evaluate that maybe their goals were correctly realized after all, because the system is still working to bring up their show. So, as you can see, each of these three stages presents some unique challenges.
Let’s take a thermostat, for example. I have a goal to make the room warmer, so I do something to my thermostat with the intention of making the room warmer. What does the system do as a result? Well, it turns the heat on, that would be the successful result of my action. But how do I know that the heat was turned on? Well, maybe I can hear it, I might hear it click on. But that’s a one time kind of thing and it might be quiet. And if I’m mishearing it, I have no way of double checking it. So I’m not sure if I heard it, and I have to go find a vent and put my hand on it and try to feel the heat coming out. And there’s more going on in a heater, it might have worked, but the heater doesn’t immediately turn on for one reason or the other. These are signs of a large gulf of evaluation. Neither the sound or the vent are optimal displays because they’re either hard to reach or possible to miss. Feeling the heat might be easy to interpret, but hearing the heater turn on might not. So either way, I have to do a lot to evaluate whether or not my action was successful. And this is all for a very small piece of feedback. Ideally if I wasn’t successful, we want the system to also tell me why I wasn’t successful so I can evaluate what I did wrong and respond accordingly. There’s a very large gulf of evaluation if there’s no indicator on the actual thermostat. So how can we resolve that? Well, simple. We just mark on the thermostat that the heat is on. That sounds trivial, but nothing in the fundamental design of this system demanded a note like this. It’s only in thinking about the system from the perspective of the user that we find that need. I can let you know as well, this system still isn’t very ideal. For various reasons, it’ll turn the heater on or the air conditioning off even when it hasn’t reached the temperature I put in. And it gives me no indication of why. I can look at the system and evaluate that the temperature is set to lower than the current temperature in the room. But at the same time, I can see that the heater isn’t on. Under those circumstances, I have no way of knowing if the heater’s malfunctioning, if the switch is wrong, or I don’t even know. In this case, it might just be that it’s set to the wrong mode. The mode is visible, but after I remembered to check it, it appears to be malfunctioning. We can imagine an alternative message on the screen indicating the direction of the relationship or something similar that would give some sign that it’s currently set incorrectly.
Here are five quick tips for bridging gulfs of evaluation. Number one; give feedback constantly. Don’t automatically wait for whatever the user did to be processed in the system before giving feedback. Give them feedback that the input was received. Give them feedback on what input was received. Help the user understand where the system is in executing their action by giving feedback at every step of the process. Number two; give feedback immediately. Let the user know they’ve been heard even when you’re not ready to give them a four response yet. If they tap an icon to open an app, there should be immediate feedback just on that tap. That way even if the app takes a while to open, the user knows that the phone recognize their input. That’s why icons briefly gray out when you tap them on your phone. Number three; match the feedback to the action. It might seem like this amount of constant immediate feedback would get annoying and if executed poorly, it really would. Subtle actions should have subtle feedback. Significant actions should have significant feedback. Number four; vary your feedback. It’s often tempting to view our designs as existing solely on a screen, and so we want to give the feedback on the screen. But the screen is where the interaction is taking place, so visual feedback can actually get in the way. Think about how auditory or haptic feedback can be used instead of relying just on visual feedback. Number five; leverage direct manipulation. We talked about this a lot more, but whenever possible, let the user feel like they’re directly manipulating things in the system. Things like dragging stuff around or pulling something to make it larger or smaller, are very intuitive actions because they feel like they’re interacting directly with the content. Use that. Again, we talk far more about this in another lesson, but it’s worth mentioning here as well. By loading these things into your short-term memory several times, we hope to help solidify them in your long-term memory. That relationship is actually something we also talk about elsewhere in this unit.
In Don Norman’s Design of Everyday Things, he provides a different way of looking at the same information. This diagram puts a greater emphasis on what the user is doing at each of these stages. The user is setting a goal, and then planning, and specifying, and performing, and then perceiving, and interpreting, and comparing. This diagram also changes our terminology a bit and with good reason. We see a bridge of execution and a bridge of evaluation. The reason for that is these behaviors specifically bridge the gulf between the goal and the world. They’re what the user has to do to make the world match their goal, and then confirm that the world now does match their goal. Norman uses these stages to introduce seven questions that we should ask ourselves when we’re designing interfaces. First, how easily can one determine the function of the device? In other words, how easily can one tell that this device can accomplish that goal? Second, how easily can one tell what actions are possible to do with the device? Third, how easily can the user determine the mapping from their intent to the actual movements or actions they need to take with the device? So, in planning to figuring out what they can do with specifying to figuring out what they should do. Then finally, how easily can the user actually perform the physical movements associated with that plan? After they’ve done these three stages, something has happened in the world, and they’re ready to see if they can perceive what has changed. So, then we ask, how easily can the user perceive or tell what state the system is in? This is the raw information that they’re getting out of the system. Then how easily can they tell if the system is in the desired state, or how easily can they interpret what they perceived? Then finally, how easily can the user determine the mapping from state to interpretation? In other words, how easily can the user compare what they interpreted as happening to what they wanted to happen? You’ll notice that these match our stages from earlier. Norman also makes it even more clear by describing this specifically in terms of what the user is thinking. The user starts by thinking, “What do I want to do?” Then they consider, “What are the alternatives for accomplishing it?” Then they specify, “What can I do to actually perform one of those alternatives?” Then they ask, “How do I do that?” Once they’ve done it, they ask, “What happened? What does that mean? Is that okay?” I like this phrasing because it makes the distinction between interpreting and comparing a little easier. Interpreting is just about understanding what happened. Comparing involves going back to the original goal that the user had. Norman also further articulates this by breaking the process into phases that span both execution and evaluation. Closest to the world, he has the visceral phase which is the actual physical activity or the raw perceptions. Above that, he has the behavioral layer, which is where we’re actually specifying what behaviors to do or interpreting the outcome of those behaviors. Then at the top level, there’s a reflective phase, which is thinking about the problem and planning our solution, or comparing the results to our original goal. If we change his vocabulary a little bit, we might describe this lowest level as raw reaction, the middle level as deliberation, and the top level as metacognition, thinking about the goal, thinking about the problem solving process, thinking about what we did and what happened. If we rewrite these phases with this vocabulary, we start to get at a diagram you might recognize if you’ve taken knowledge-based AI. If you haven’t, here’s a little teaser for what you’ll see if you do.
[MUSIC] Good design. A phone that quietly clicks every time a letter is successfully pressed to let you know that the press has been received. Bad design. A phone that loudly shouts every letter you type. » P. I. C. Remember small actions get small feedback. The only time you might want your device to yell a confirmation at you is, if you’d just ordered a nuclear launch or something.
Let’s pause for a second, and reflect on the roles of gulfs of execution and gulfs of evaluation in our own lives. So try to think of a time when you’ve encountered a wide gulf of execution, and a wide gulf of evaluation. This doesn’t have to be a computer, it could be any interface. In other words, what was a time when you were interacting with an interface, but couldn’t think of how to accomplish what you wanted to accomplish? What was a time when you were interacting with an interface and couldn’t tell if you’d accomplished what you wanted to accomplish?
It’s not a coincidence that I’m filming this in my basement. This actually happened to me a few weeks ago. The circuit to our basement was tripped, which is where we keep our modem, so our internet was out. Now this is a brand new house and it was the first time we tripped a breaker, so I pulled out my flashlight and I opened the panel. And none of the labels over here clearly corresponded to the breaker I was looking for over here. I ended up trying every single one of them and still it didn’t work. I shut off everything in the house. Why didn’t it work? In reality, there was a reset button on the outlet itself that had to be pressed. The only reason we noticed it was because my wife noticed something out of the corner of her eye turning on and off as I switched these. That was a terribly large gulf of execution. I knew what I wanted to accomplish, I could translate it into the system’s terms easily, reset a breaker. But figuring out the actions to accomplish that goal was very difficult. That’s a large gulf of execution. How was that? » [SOUND] What? Sorry, I wasn’t paying attention. » You weren’t watching? [LAUGH] So I have no way of knowing if that was good or not? Isn’t that a terrible gulf of evaluation? I joke, but a lack of feedback on your performance at a task, whether it be filming, like I’m doing now or doing a project like you’ll do later in our material, presents the same kind of poor gulf of evaluation.
HCI, but nowadays this is basically a computer on wheels. So let’s talk a little bit about how feedback cycles apply here. Let’s start with the ignition. The button that I start my car is right here. Why is it located there?
Before cars had push button starts, this is where you inserted the key to turn on the ignition. Why? I have no idea. But I do know that now, the start button can be placed in any number of different locations. So why do we put it where we’ve always put it? Well, the reason is, that’s where the driver expects it to be placed. We help them across the gulf of execution by designing a system that’s consistent with their expectations about how it should work. It makes it easier for them to translate their intentions into actions. Now, other times we might violate this principle because of some other benefits we hope to gain. But generally speaking, when all else is equal, we want to stay consistent with the way users expect our systems to work.
So we know where the ignition button is. Let’s press it. [NOISE] Do you think the car turned on?
Well, what do we know? We know the car was off. We know this is clearly the power button based on how it’s labeled and where it’s located. And most importantly, when I pressed it we heard kind of a happy confirmation-y sound. So did the car turn on? Actually, it didn’t. To turn this car on you have press the brake petal while pressing the on button. The car doesn’t do a great job of helping us across that goal of execution. There’s no indicator that you’re doing it wrong until you’ve actually already done it wrong. But the car does give us a short gulf of evaluation. If you do it incorrectly, an alert pops up on the dashboard letting you know you need to press the brake pedal and then press the on button. The output presented is easy to interpret. As presented in the context of when you need to know that information, so you kind of understand that it’s a response to what you just did. So here we have some trouble with the gulf of execution but the gulf of evaluation is still pretty short. So now that I see this message I press down the brake pedal, press the on button [SOUND] and now the car is on.
So now that we’ve seen the way this feedback cycle currently works, let’s talk about improving it. How might we make this feedback cycle even better? How might we narrow the gulf of execution and the gulf of evaluation?
So here are a few ideas that I had. We know that the screen can show an alert when you try to turn the car on without pressing the brake pedal down. Why not show that alert as soon as the driver gets in the car every time? That doesn’t widen the gulf of execution for an expert user, but it does narrow it for a novice user, because even a novice can see that alert the first time they get in the car. But what still throws me off to this day is the sound the car makes when you try and turn it on. Watch. [SOUND] Did the car turn on? No. It didn’t that time. [SOUND] Now it turned on. So it plays the same sound initially when you press the button and then plays a different follow up sound to confirm that the car actually turned on. I know why they do this, that one sound just confirms that you pressed the button successfully while the other sound confirms that the car turned on. But for me, I would just as soon have two different sounds confirm whether or not you just pressed the button or whether the car turned on. That way, just the presence of a sound confirms the fact that the button was pressed and the nature of the sound confirms the effect of that press.
I asked you earlier to pick an area of HCI in which you’re interested and reflect on it as you go through this course. Depending on the area you selected, feedback cycles can play a huge number of different roles. In health care, for example, feedback cycles are critical to helping patients manage their symptoms. That relies on the results of certain tests being easy to interpret and evaluate. Feedback cycles are also present in some of the bigger challenges for gesture-based interactions. It can be difficult to get feedback on how system interpreted a certain gesture and why it interpreted it that way. Compare that to touch, where it’s generally very easy to understand where you touched the screen. So, think for a moment about how feedback cycles affect the area you chose to keep in mind.
Lately I’ve encountered another interesting example of feedback cycles in action. You may have actually seen this before as well. They’re the new credit card readers. My wife sells arts and crafts at local events, and so she has these Square readers that can scan credit cards on her phone. One version lets you swipe, and the new version lets you insert the card. So let’s check this out real quick. With the swipe version you just insert the card and pull it through, just like a traditional card reader. The problem is there’s typically no feedback on whether you’re swiping correctly. And what’s more is you can be wrong in both directions. You can be both too fast or too slow. So you may have had a time when you were trying to swipe a credit card on some kind of reader, and you kept doing it more and more slowly and deliberately, thinking that the problem was that you had done it too fast originally. And then you discover that you’ve actually been going too slowly all along and your slowing down was actually counterproductive. There’s no feedback here and the space and acceptable input is bounded on both sides. You have to go above one speed and below another speed. But now credit card readers are moving to this model where you just insert the card. You try, at least. In terms of feedback cycles, in what ways is this actually better?
First, in terms of the gulf of execution, the insertion method is actually physically easier to do. While you can be both too fast and too slow with the sliding method, you can’t push it too far in with the insertion method. So you know if there’s an error, it’s because the card isn’t far enough into the reader. And second, there’s rich feedback with the insertion method. It doesn’t even have to come from the screen telling you that you didn’t do it correctly. You feel the card stop when it’s far enough into the reader. You have immediate physical feedback on whether you’re putting it in the right place, and whether you’ve actually put it far enough in, rather than delayed feedback asking you to try again after some kind of waiting period.
So, using the insertion method is significantly easier. However, the insertion method introduces a new problem. With the sliding method, I never had to actually physically let go of my card, so there was little chance of me walking away without it. With the insertion method, I insert the card and I wait. I’m not used to having to remember to retrieve my card from the card reader. Now this isn’t quite as big a deal with these new portable readers, but for the mounted ones you see in stores it can be far more problematic. So how can we build some feedback into the system to make sure people remember their cards when they walk away?
There are a few things we could do here. We might build some kind of buzzer into the card reader to let the customer know when they can take their card out. That would make sure that they don’t leave without it. ATM machines often do this, actually. They’ll ring a buzzer until the card and the cash are removed. But that’s noisy and potentially irritating. It would mess with the ambiance of a restaurant or something like that. We could do something super complicated, like pair the credit card with a smartphone and ring the phone when it gets too far away from the credit card. But that requires adding some new technology to every single credit card, which could be a pretty big expense. So what about something simpler? Why not force a customer to remove the credit card in order to get the receipt and their goods? Unless they’re going to walk away without what they came to buy, that’ll ensure that they remember their card.
Now notice one last thing about this example. We’ve been discussing how to make the process of sliding or swiping a credit card easier. What’s wrong with that question?
The problem is that we’re not focused on the right task. Our task shouldn’t be to swipe a credit card, or insert a credit card, or anything like that. Our task should be how to most easily pay for purchases. And possibly the easiest way to do that would be to design a system that lets you just tap your phone against the reader, this reader actually does that. That way, we can use the thing that people have on them at all times. Now maybe that’s isn’t the best option for various other reasons, but the important thing is we need to focus on what we’re really trying to accomplish. Not just how we’ve done it in the past. We can make incremental improvements just sliding or swiping or inserting a credit card all we want. But we should always keep our eyes on the underlying task that the user needs to accomplish.
Today, we’ve talked about, arguably, the most fundamental concept of human-computer interaction, feedback cycles. We describe feedback cycles for our purposes as the exchange of input and output between a user and a system to accomplish some goal. We discussed feedback cycles’ incredible ubiquity in other fields and discussions. We talked about gulf of execution, the distance between knowing what they want to accomplish and actually executing the steps necessary to accomplish it. We talked about gulf of evaluation, the distance between making some change in the system and evaluating whether or not the goal was accomplished. We introduced the seven questions we need to ask ourselves to bridge those gulfs. Now that we understand these goals, our next goal is to understand methods for crossing them.
[MUSIC] Today we’ll talk about two applications of good feedback cycles. Direct manipulation and invisible interfaces. Direct manipulation is the principle that the user should feel as much as possible like they’re directly controlling the object of their task. So for example, if you’re trying to enlarge an image on your phone, it might be better to be able to drag it with your fingers rather than tapping a button that says, zoom in. That way you’re really interacting directly with the photo. New technologies like touch screens are making it more and more possible to feel like we’re directly manipulating something, even when there’s an interface in the way. At their best, the interface actually disappears, which is what we mean by an invisible interface. With an invisible interface the user has to spend no time thinking about the interface that they’re using. All their time is dedicated to thinking about the task that they’re performing.
Our goal is to narrow the gulf of execution and the gulf of evaluation as much as possible. And arguably the ultimate form of this is something called direct manipulation. Now today direct manipulation is a very common interaction style. But in the history of HCI it was a revolutionary new approach. Now to understand direct manipulation, let’s talk about the desktop metaphor. The files and folders on your computer are meant to mimic physical files and folders on a desktop. So, here are on my physical desktop, I have some files. What do I do if I want to move them? Well, I pick them up and I move them. What do I do if I want to put them in a folder? I pick them up and put them in the folder. I’m physically moving the files from where they are to where I want them to be. If files and folders on a computer are meant to mimic files and folders on a physical desk, then shouldn’t the act of moving them also mimic the real world action of moving them? Wouldn’t it narrow the gulf execution to leverage that real world experience and that real world expectation?
Files and folders on my computer are meant to mimic files and folders on my physical desk. So we ideally want the action of moving them around on my computer to mimic the action of moving them around on my physical desk. But it wasn’t always like this. Before graphical user interfaces were common, we moved files around using command line interfaces like this. The folder structure is still the same on the operating system. But instead of visualizing it as folders and icons, I’m interacting with a text-based command line. To view the files, I might need to type a command like ls, which I just have to remember. If I don’t remember that command, I don’t have much recourse to go find out what I’m supposed to be doing. To move a file, I need to type something like this. I have to type the command, the file I want to move, and the folder I want to move it to. Again, if I forget the name of that command, or the order of the parameters to provide, there’s not a whole lot I can to to recover from that. I need to run off to Google and find out what the correct order of the commands was. Which is actually what I did before filming this video because I don’t actually use the terminal very often. Then when I execute that command, there’s not a lot of feedback to let me know if it actually executed correctly. I might need to change folders to find out. There I see the files present in that folder but I had to go and look manually. There’s nothing really natural about this. Now don’t get me wrong, once you know how to interact with this interface, it’s very efficient to use. But when you’re a novice at it, when you’ve never used it before, this is completely unlike the task of managing physical files on your real desk. Then, the computer mouse came along. And with it came the ability to move a mouse around the screen. Equipped with this, my action in moving files and folders becomes much more direct. I can actually just click the file I want to move and drag it into the new folder. I get instant feedback by the fact that the file disappeared as soon as I dragged it over. And there was a sound effect that you may or may not have been able to hear. So now instead of typing in some cryptic command that I just have to be able remember, I can just click on the file I want to move, and physically drag it to the folder in which I want to have it. That’s a very natural interaction, it mimics what I do on my physical desk. Moving the mouse around is a lot better than having to type in those commands, but the gulf of execution and evaluation are still present, especially for some novice users. There’s still some interpretation that has to happen to understand that when I move my hand a little left on the mouse, the cursor on screen will move to the left as well. And while clicking feels kind of like grabbing, there’s still some interpretation there. It’s more direct than the command line, but there’s still a gap. The modern touchscreens made direct manipulation more direct than ever. Let’s say I want to put an icon into a folder on my screen. How do I do it? I hold down the icon, and I drag it to the screen. The fact that if I wanted to move something around my desk, I would have to hold it down, means that this is almost entirely direct manipulation. I don’t need any prior knowledge to attempt to do what feels natural for moving that icon into that folder. That gives us a nice, general heuristic to keep in mind. How do we help the user interact most closely with the target of their task? How do we make it so they’re manipulating it as directly as possible?
The seminal paper on direct manipulation interfaces came out in 1985, co-authored by Edwin Hutchins, James Hollan, and Don Norman. We’ll talk a lot more about Hutchins and Norman later in our conversations. Hutchins coauthored the foundational paper on Distributed Cognition, and Norman created one of the most accepted set of design principles in his seminal book Design of Everyday Things. But in 1985, direct manipulation was starting to become a more common design strategy. Hutchins, Hollan, and Norman identified two aspects of directness. The first was distance. Distance is the distance between the users goals and the system itself. This is the idea of goals of execution and evaluation that we talk about in the context of feedback cycles. They write that, “The feeling of directness is inversely proportional to the amount of cognitive effort it takes to manipulate and evaluate the system.” In other words, the greater the cognitive load required to use the system, the less direct to the interaction with the system actually feels. The authors break distance in the two components. Semantic distance and articulatory distance. Semantic distance refers to the difference between the users goals and their expression in the system. In other words, it’s how hard it is to know what to do. Articulatory distance is the distance between that expression and its execution. In other words, it’s how hard it is to actually do what you know to do. You might notice that semantic distance encompasses our identify intentions and identify actions parts of our gulf of execution, and articulatory distance comprises that execute actions phase. This is brought together here in figure six from this paper. Yeah, there we go. The user starts with some goals, and translates those goals into the intentions in the context of the interface. They then translate those intentions into the form of the input, the actions, and execute those actions. The system then does something, and gives back some form of output. The user then interprets the form of that output to discern the meaning of that output, and then evaluates whether or not that meaning matches their goals. So to take an example, when I brought up this figure, I needed to rotate the paper to display it correctly, that was my goal. I translated that goal into the context of the application, a rotate option that was probably hidden somewhere. I then identified the action, which was pressing the rotate button, and I executed that action. The system then did something and returned the output to me. The output specifically was the paper turned upside down instead of turning the way I wanted it. That was the form of the output. I then interpreted that form to discern the meaning that it had rotated at the wrong way. I evaluated that my goals weren’t accomplished, and now I knew what to do next. I then pressed the button two more times to rotate it twice again, and the system then return that output to me. I interpreted the form of that output to mean that the figure was now right-side up, and I evaluated that that matched my initial goals. You might be able to see that this cycle is happening constantly whenever you’re interacting with the computational interface. You could think of it in terms of broad tasks like searching a document for some keyword, or you could think of it in terms of each individual little tasks like interacting with the menus and pulling up the right prompts. But distance is only one component of direct manipulation. It’s possible to have interfaces with very little distance that nonetheless are not examples of this kind of direct interaction. Everything we’ve talked about so far is true of feedback cycles in general, not just of direct manipulation. That’s why the second component of this is direct engagement. What sets direct manipulation apart is the second component. The authors of the paper write that, “The systems that best exemplify direct manipulation, all give the qualitative feeling that one is directly engaged with control of the objects not with the programs, not with the computer, but with the semantic objects of our goals and intentions.” If we’re moving files, we should be physically moving the representation of the files. If we’re playing a game we should be directly controlling our characters. If we’re navigating channels, we should be specifically selecting clear representations of the channels that we want. And that’s what takes a general good feedback cycle, and makes it an instance of direct manipulation. We can shorten the gulf of execution and evaluation in a number of ways without direct manipulation, but direct manipulation is a powerful method for shortening that distance.
Virtual reality right now is making some incredible strides in facilitating direct manipulation in places where it just hasn’t been possible before. Traditionally, when designers are designing stuff in 3D, they are forced to use a 2D interface, that translation from 2D to 3D really gets in the way of directly manipulating whatever is being designed. Through virtual reality though, designers are able to view what they’re designing in They can rotate it around with the same hand motions you’d use to rotate a physical object. They can physically move around the object to get different angles on it. So, virtual reality is allowing us to bring the principle of direct manipulation to tasks it hasn’t been able to touch before, but there’s still a lot of work to do. Gesture interfaces like those used in virtual reality struggle with some feedback issues. We aimed to make the user feel like they’re physically manipulating the artifact. When you’re working with something with your hands, it pushes back against you. How do we recreate that in virtual reality?
Take a moment real quick and reflect on some of the tasks you perform with computers day-to-day. What are some of the places where you don’t interact through direct manipulation? If you’re having trouble thinking of one, think especially about places where technology is replacing things you used to do manually. Chances are, the physical interface was a bit closer to the task than the new technical one. How can the technical interface better leverage direct manipulation?
When I was writing the script for this exact video, I was interrupted by a text message from a friend of mine. And in the reply I was writing, I wanted to include a smiley face. We know that using emojis and emoticons tends to humanize textual communication. On my phone, the interface for inserting an emoji is to tap an icon to bring up a list of all the emojis and then select the one that you want. When I’m reacting to someone in conversation, I’m not mentally scrolling through a list of all my possible emotions and then choosing the one that corresponds. I’m just reacting naturally. Why can’t my phone capture that? Instead of having to select smiling from a list of emotions, maybe my phone could just have a button to insert the emotion corresponding to my current facial expression. So to wink, I would just wink. To frown, I would just frown. It wouldn’t be possible to capture every possible face, but for some of the most commonly used ones, it might be more efficient.
There may be no better example of the power of direct manipulation than watching a baby use an interface. Let’s watch my daughter, Lucy, try and use her Kindle Fire tablet. My daughter Lucy is 18 months old, yet when I give her an interface that uses direct manipulation, she’s able to use it. She wouldn’t be able to use a keyboard or a mouse yet, but because she’s directly interacting with the things on the screen, she can use it. Actually, there might be an even better example of direct manipulation in action. There are games made for tablet computers for cats. Yes, cats can use tablet computers when they use direct manipulation.
Let’s try a quick exercise on direct manipulation. The Mac touchpad is famous for facilitating a lot of different kinds of interactions. For example, I can press on it to click, press the two fingers to right-click. I can pull up and down with two fingers to scroll up and down. I can double tap with two fingers to scroll in and out a little bit. And I can pinch to zoom in and out a lot more. Which of these are good examples of direct manipulation in action?
Now there’s some room for disagreement here, but I think these five seem to be pretty cut and dry. We can think about whether or not these are direct manipulation by considering whether or not what we’re doing to the touchpad is what we’d like to do to the screen itself. For clicking, I would consider that direct manipulation because just as we press directly on the screen we’re pressing directly on a touchpad. Right-clicking though, the two finger tap, doesn’t really exemplify direct manipulation, because there’s nothing natural about using two fingers to bring up a context menu as opposed to using one to click. We have to kind of learn that behavior. Scrolling makes sense because with scrolling it’s like I’m physically pulling the page up and down to see different portions of it. The two finger tap for zooming in and out a little bit though isn’t really direct manipulation, because there’s no real clear reason that needs to zoom in, zoom out. Pinching on the other hand though, makes sense. Because it’s as if I’m physically grabbing the page and shrinking and enlarging it. So some of these I would say are pretty good examples of direct manipulation. While others are things that we kind of have learn to do.
The Mac Touchpad has some interesting examples of how you can make indirect manipulation feel more direct. For example, if I swipe from the right to the left on the touchpad with two fingers, it pulls up this notification center over on the right. This feels direct because the notification center popped up in the same place on the screen that I swiped on the touchpad. The touchpad is almost like a miniature version of the screen. But they could have placed a notification center anywhere and used any kind of interaction to pull it up. This isn’t like scrolling where there is something fundamental about the content that demands a certain kind of interaction. They could have designed this however they wanted. But by placing the notification center there and using that interaction to pull it up, it feels more direct. Now, animation can also help us accomplish this. On the Touchpad, I can clear the windows off my desktop by kind of spreading out my fingers on the touchpad, and the animation shows them going off to the side. And while that’s kind of like clearing off your desk, I’d argue it’s not close enough to feel direct except that the animation on screen mimics that action as well. The windows could have faded away or they could have just slid to the bottom and still accomplish the same function of hiding what’s on my desktop. But the animation they chose reinforces that interaction. It makes it feel more direct. The same thing actually applies with Launchpad, which we bring up with the opposite function by pinching our fingers together. The animation looks kind of like we’re pulling back a little bit or zooming out and we see the launch pad come into view, just as the gesture is similar to zooming out on the screen. So direct manipulation isn’t just about designing interactions that feel like you’re directly manipulating the interface. It’s also about designing interfaces that lend themselves to interactions that feel more direct.
Depending on the area of HCI that you chose to explore, direct manipulation might be a big open question. So, for gesture-based interaction for example, you generally not actually touching anything. Direct manipulation is contingent on immediate feedback that maps directly to the interaction. So how do you create that in a gesture-based interface? This is a big challenge for virtual reality as well. Virtual reality thrives on making you feel like you’re somewhere else both visually and auditorily, but it has a long way to go can aesthetically. How do you create the feeling of direct manipulation based on physical action, where you can only give feedback visually or auditorily? So, take a second, and reflect on how these principles of direct manipulation apply to your chosen area of HCI.
Whether through using direct manipulation, through innovative approaches to shrinking these gulfs or through the user’s patience and learning, our ultimate goal is for the interface between the user and the task to become invisible. What this means is that even though there is an interface in the middle, the user spends no time thinking about it. Instead, they feel like they’re interacting directly with the task rather than with some interface. So for example, I have a stylus and I’m going to write on this tablet computer. I’m interacting with an interface just translating my drawing into data in the system. But for me, this feels just like I’m writing on a normal page. That feels just like writing on paper. This interface between me and the data representation of my drawing underneath is pretty much invisible. I feel like I’m writing on paper. Contrast that with trying to draw with a mouse. That feels extremely unnatural. I’m very well aware of the mouse as the interface between myself and this drawing task. So the direct manipulation facilitated by the stylus gets me much closer to my task and helps the interface disappear between me and what I’m trying to accomplish.
[MUSIC] Good design. Interfaces that are metaphorically invisible. Bad design, interfaces that are literally invisible. Well, kind of, just your base interfaces are in one sense literally invisible, that’s actually why it’s so important to give great feedback. Because otherwise, it’s tough to gauge the success of a gesture interaction.
We shouldn’t fall into the trap of assuming that just because an interface has become invisible, the design is great. Interfaces become invisible not just through great design, but also because users learn to use them. With enough practice and experience, many users will become sufficiently comfortable with many interfaces to feel invisibly integrated in the task. So take driving for example. Lets say I’m driving a car and I discover I’m headed right for someone. What’s my reaction? Well I turn the wheel to the side and I press my brake. It’s instinctive. I do it immediately, but think about that action. If I was just running down the street and suddenly I saw someone in front of me would it be natural for me to go like that? Of course not. The steering wheel was an interface I used to turn to the left. But it’s become invisible during the task of driving because of all my practice with it. But just because the interface has become invisible doesn’t mean it’s great interface. People spend months learning to drive, they pay hundreds of dollars for classes. And they have to pass a complicated test. Driving is important enough that it can have that kind of learning curve. But for the interfaces that we design, we generally can’t expect users to give us an entire year just to learn to use them. We’ll be lucky if they give us an entire minute to learn to use them. So our goal is to make our interfaces invisible by design.
Our goal is to create interfaces that are invisible from the moment the user starts using them. They should feel immediately as if they’re interacting with the task underlying the interface. Now this is an extremely tall order and one we honestly probably won’t meet very often, but it’s the goal. In fact, in my opinion, this is why people tend to underestimate the complexity of HCI. When you do things right, people won’t be aware that you’ve done anything at all. So how do we create interfaces that are invisible from the very first moment the user starts using them? That’s precisely what we’ll discuss in a lot of our conversations about HCI. We’ll talk about principles for creating interfaces that disappear like leveraging prior expectations and providing quick feedback. We’ll also talk a lot about how to get inside the user’s head and understand what they’re seeing when they look at an interface that we can make sure that they’re internal mental model matches the system. In fact, if we consider invisibility to be a hallmark of usable design, then this entire course could be retitled Creating Invisible Interfaces.
Here are five tips for designing invisible interfaces. Number one, use affordances. We talked more about affordances when we discuss design principles in heuristics, but affordances are places where the visual design of the interface is just how it’s supposed to be used. Buttons are for pressing, dials are for turning, switches are for flicking. Use these expectations to make your interface more useable. Number two, know your user. In visibility means different things to different people. Invisibility to a novice means the interactions are all natural, but invisibility to an expert means maximizing efficiency. Know for whom you’re trying to design. Number three, differentiate your user. Maybe you’re designing something for both novices and experts. If that’s the case, provide multiple ways of accomplishing tasks. For example, having Copy and Paste under the Edit menu, keeps those options discoverable. But providing control c and control v as shortcuts keep those actions efficient. Number four, let your interface teach. When we think of teaching users how to use our software, we usually think of tutorials or manuals. But ideally the interface itself will do the teaching. For example, when users select Copy and Paste from the Edit menu, they see the hotkey that corresponds to that function. The goal is to teach them more efficient way of performing the actions without requiring them to already know that in order to do their work. Number five, talk to your user. We’ll say this over and over again, but the best thing you can do is talk to the user. Ask them what they’re thinking while you use an interface. Note especially whether they’re talking about the task or the interface. If they’re talking about the interface, then it’s pretty visible.
Reflecting on where we’ve encountered invisible interfaces is difficult, because they were invisible. What makes them so good is the fact that we didn’t have to notice them. But give it a try anyway. Try to think of a time where you picked up a new interface for the very first time and immediately knew exactly how to use it to accomplish the task you had in mind.
One of my favorite examples of an interface that is invisible by design comes from a video game called Portal 2. In lots of video games, you use a control stick to control the camera in game, but different people have different preferences for how the camera should behave. Some feel if you press up, you should look up. Others like myself, feel if you press down you should look up, more like you’re controlling an airplane with a joystick. In most games you have to set this manually by going to options, selecting camera controls, and enabling or disabling a y axis and it’s just a chore. But in portal two, watch what happens. » You will here a buzzer. When you hear the buzzer, look up at the ceiling. [SOUND] Good. You will hear a buzzer. When you hear the buzzer, look down at the floor. [SOUND] Good. This completes the gymnastic portion of your mandatory physical and mental wellness exercise. » Did you see that? It was so subtle you might not have even noticed it. A character in the game asked me to look up. The game assumed whichever direction I pressed was the way I would want to press when I want to look up. And set my preference accordingly. No option screen, no changing settings. The game automatically and invisibly had me complete my goal of correctly setting my camera preference.
For this design challenge, let’s tackle one of the most common problems addressed in undergraduate HCI classes, designing a better remote control. Now, these probably aren’t very good interfaces. And that’s not to say that they’re poorly designed, but the constraints on how many different things they have to do and how limiting the physical structure can be, make these difficult to use. You might have seen humorous images online of people putting tape over certain buttons on the controls to make them easier to use for their parents or their kids. How would we design an invisible interface for universal remote control, one that doesn’t have the learning curves that these have?
Personally I think this is a great candidate for a voice interface. And in fact, Comcast, Amazon and others have already started experimenting with voice interfaces for remote controls. One of the challenges with voice interfaces is that generally the commands aren’t very discoverable. Generally, if you don’t know what you can say, you have no way of finding out. But watching TV and movies is such a normal part of our conversations that we already have a vocabulary of how to say what we want to do. The challenge is for us designers to make a system that can understand that vocabulary. That way when I say, watch Community, it understands that Community is a TV show and it tries to figure out, do I grab it from the DVR, do I grab it from On Demand, do I see if it’s on live? The vocabulary for the user was very natural. So for example, watch Conan. » Well tonight, a fan named David Joiner thinks he caught a mistake. He says it happened when I was telling a joke recently about Rand Paul. » Hey Conan, I was watching your episode on April 17th, and you said that Rand Paul wanted to run for president. » I had to put that in there somewhere.
Today, we’ve talked about two applications of effective feedback cycles: direct manipulation and invisible interfaces. We talked about how interfaces are most effective when the user has a sense that they’re directly manipulating the object and their tasks. We talked about how modern technology like touchscreens and virtual reality are making it possible for manipulation to feel more and more direct. We talked about how the most effective interfaces become invisible between the user and their task. Since the user it spends no time at all thinking about the interface. We talked about interfaces can become invisible via either learning, or design and are most interested in designing them to become invisible. To a large extent, that’s the definition of usable design, designing interfaces that disappear between the user and their task.
[MUSIC] Human computer interaction starts with human. So it’s important that we understand who the human is, and what they’re capable of doing. In this lesson, we’re going to bring up some psychology of what humans can do. We’ll look at three systems. Input, processing, and output. Input is how stimuli are sent from the world, and perceived inside the mind. Processing is cognition, how the brain stores, and reasons over the input it’s received. Output is how the brain then controls the individual’s actions out in the world. Now, we’re going to cover a lot of material at a very high level. If you’re interested in hearing more, I recommend taking a psychology class, especially one focusing on sensation and perception. We’ll put some recommended courses in the notes.
In discussing human abilities, we’re going to adopt something similar to the processor view of the human. For now we’re interested in what they can do, physically, cognitively, and so on. So we’re going to focus exclusively on what’s going on over here. We’re going to look at how the person makes sense of input, and how they then act in the world. And right now, we’re not going to worry too much about where that input came from, or what their actions in the world actually do. Notice that in this lesson we’re discussing the human, almost the same way we discuss the computer, or the interface, in most lessons. The human is something that produces output and consumes input, just like a computer might be otherwise. But for now, we’re only interested in how the human does this.
Let’s start by talking a bit about what the average person can sense and perceive. So, here we have Morgan again. Morgan has eyes. Morgan’s eyes are useful for a lot of things. The center of Morgan’s eye is most useful for focusing closely on color or tracking movement. So, we can assume that the most important details should be placed in the center of her view. Morgan’s peripheral vision is good for detecting motion, but it isn’t as good for detecting color or detail. So while we might use her periphery for some alerts, we shouldn’t require her to focus closely on anything out there. As a woman, Morgan is unlikely to be colorblind, she has about a 1 in 200 chance. Men have a much greater prevalence of color blindness at about 1 in 12. Either way, that’s a significant body of people. So we want to avoid relying on color to understand the interface. We can use it to emphasize knowledge that it’s already present in the system, but using the system shouldn’t rely on perceiving color. Sight is directional. If Morgan’s looking the wrong way or has her eyes closed, she’ll miss visual feedback. As Morgan gets older, her visual acuity will decrease. So if we’re designing something with older audiences in mind, we want to be careful of things like font size. Ideally, these would be adjustable to meet the needs of multiple audiences. All together though, Morgan’s visual system is hugely important to her cognition. The majority of concepts we cover in HCI are likely connected to visual perception. [SOUND]
Morgan also has ears. Morgan can discern noises based on both their pitch and their loudness. Her ears are remarkably good at localizing sound as well. In fact, she can tell the difference between a nearby quiet sound and a far away loud sound. Even if their relative pitches and loudnesses are the same when they reach her ear. Unlike vision, hearing isn’t directional. Morgan can’t close her ears or point her ears the wrong direction so she can’t as easily filter out auditory information. That might be useful for designing alerts, but it’s problematic for overwhelming her or sharing too much information with the people around her.
Morgan’s skin can feel things. It can’t feel at a distance but it can feel when things are touching right up against it. It can feel a variety of different types of input, like pressure, vibration and temperature. Like listening, Morgan can’t easily filter out touch feedback. But unlike listening, touch feedback is generally only available to the person it’s touching, so it can be used to create more personal feedback. Traditionally, touch feedback, or haptic feedback, has been very natural. Morgan feels the keys go down as she presses them on keyboard. But with touchscreens, motion controls and virtual reality, touch needs to be more and more designed explicitly into the system if we’re to use it.
Let’s design something for Morgan real quick. Let’s tackle the common problem of being alerted when you’ve received a text message. Here are the constraints on our design for Morgan. It must alert her whether the phone is in her pocket or on the table. It cannot disturb the people around her. [SOUND] And yes, vibrating loudly against the table counts as disturbing the people around her. You’re not restricted to just one modality, though, but you are restricted to the sensors that the phone has available.
Here’s one possible design. We know that smartphones have cameras and light sensors on them. We can use that to determine where the phone is, and what kind of alert it should trigger. If the sensor detects light, that means the screen is visible, so it might alert her simply by flashing its flashlight or illuminating the screen. If the sensor does not detect light, it would infer that the phone is in her pocket and thus, would vibrate instead. Now, of course, this isn’t perfect. It could be in her purse, or she could have put it face down. That’s why we iterate on a design like this, to improve it based on the user’s experiences.
After the perception portion of this model comes the cognition portion, starting with memory. There are lots of different models of memory out there. For our purposes we’re going to talk about three different kinds of memory, the perceptual store or working memory, the short-term memory, and the long term memory. Some scientists argue that there are other types of memory as well, like an intermediate sort of back of the mind memory. But the greatest consensus is around the existence of at least these three kinds. So let’s start with the first, the perceptual store or the working memory. The perceptual store is a very short term memory lasting less than a second. One of the most common models of working memory came from Baddeley and Hitch in 1974. They described it as having three parts. First, there’s the visuospatial sketchpad which holds visual information for active manipulation. So for example, picture a pencil. The visuospatial sketchpad is where you’re currently seeing that pencil. A second part is the phonological loop. The phonological loop is similar, but for verbal or auditory information. It stores the sounds or speech you’ve heard recently, such as the sound of me talking to you right now. A third part is the episodic buffer. The episodic buffer takes care of integrating information from the other systems as well as chronological ordering to put things in place. Finally all three of these are coordinated by a central executive. So let’s try an example of this. I’m going to very quickly show you a picture and ask you a question about it. Don’t focus on any particular portion of the picture, try to view it as a whole. What was the score on the scoreboard?
As you can now see, the score was 0 to 0. Now, as you tried to reason over that, what you probably did was picture the image in your mind. That was trying to reason over what was stored in the perceptual buffer, and it decayed very quickly. However, if you’re a fan of baseball, you probably had a better chance of getting that right. That’s because you have some domain expertise. You’re better able to process images about that domain more quickly. You might, for example, have recognized that most of the innings weren’t marked. So that increases the odds that the score of the game was pretty low. This idea is actually the foundation of a fascinating study about chess experts versus novices and recognizing the configuration of chess boards. The study found that experts were far better than novices at remembering realistic chess boards that were only flashed for a short period of time, like the one on the left. But experts were no better than novices at remembering random chessboards. So expertise, or rehearsal, delays the decay of the perceptual buffer.
When we’re designing interfaces, short-term memory is one of our biggest concerns. It’s important we avoid requiring the user to keep too much stored in short-term memory at a time. Current research shows that users can really only store four to five chunks of information at a time. For a long time, there was a popular idea that people could store seven, plus or minus two, items in memory at a time. But more recent research suggests that the number is really four to five chunks. There are two principals we need to keep in mind here though. The first, is the idea of chunking. Chunking is grouping together several bits of information into one chunk to remember. So to illustrate this, let’s try it out. I’m about to show you six combinations of letters. Try to memorize them and then enter them into the exercise that pops up. Are you ready? Now to keep you from just rehearsing it in your perceptual store until you can reenter it, I’m going to stall and show you some pictures of my cats. There’s one. There’s both of them, and here’s my daughter. Okay, now fill in the words.
So what happened? Well, what likely happened is that you maybe had only a little bit of trouble remembering the two real words that were listed on the right. You might have had some more trouble remembering the two words that were listed in the middle that were fake words, but did nonetheless look like real words. And you probably had a lot of trouble remembering the two series of letters over on the left. Why is all of that? Well, when it came to memorizing these two words, you were just calling up a chunk that you’ve already had. You didn’t see these as arbitrary collections of letters, you just saw them as words. For the ones in the middle, you’ve never seen those combinations of letters before, but you could pronounce them as if they were words. So you likely saw them as fake words rather than just random collection of letters. For these over the left though, you had to memorize five individual characters. So that’s means that while these four were able to jump in the words or pseudo words. These likely had to be remembered as five chunks each. That’s makes this much more difficult to remember than these.
However, there is a way we can make this easier. So let’s ask a different question. Which of these six words did you see in the exercise previously?
Even if you had trouble just naming these series of letters in the exercise previously. You probably were much more successful at this exercise. Why is that? That’s because it’s far easier to recognize something you know than to recall it independently. And that’s a useful take away for us as we design interfaces. We can minimize the memory load on the user by relying more on their ability to recognize things than to recall them
So what are the implications of short term memory for HCI? We don’t want to ask the user to hold too much in memory at a time, four to five chunks is all. Asking the user to hold ten numbers in short term memory, for example, would probably be too much. But we can increase the user’s effective short term memory capacity by helping them chunk things. For example, this is probably by far easier to remember, even though it’s the same content. We’ve shrunk ten items into three. And we’ve used a format for phone numbers with which you’re probably familiar if you’re in the United States. If you’re from outside the U.S., you might be familiar with a different grouping. But the same principle applies. And finally, when possible, we should leverage recognition over recall. For example, if I ask you to recite the number, maybe you could. In fact, go ahead. Try it. Whether or not you could do that, you almost certainly, though, can pick it from this list. This is one of the reasons why menu bars and tool strips are so ubiquitous in software design. The user doesn’t have to remember the icon for a command or the name of an option. They just have to recognize it when they see it.
Finally, we have long-term memory. Long-term memory is a seemingly unlimited store of memories, but it’s harder to put something into long-term memory, than to put it into short-term memory. In fact, to load something into long-term memory, you generally need to put it into short-term memory several times. To demonstrate this, I’m going to describe something called lightner system. The lightner system is a way of memorizing key value pairs, or in other words memorizing flashcards. Those can be words and their definitions, countries and their capitals, laws and their formulas, anything where you’re given a key and asked to return a value. So I have some flashcards here that have the capitals of the world. What I do is go through each one, read the country, check to see if I remember the capital. And if I do, I’ll put it in the pile on the right. I read the country and don’t know the capital, I’ll put it in the pile on the left. So let me do that real quick. [MUSIC] Now tomorrow, I will just go through the pile on the left. Any that I remember from the pile on the left tomorrow, I’ll move to the pile on the right. Any that I still don’t remember, will stay in the pile on the left. So I’m focusing my attention on those that I don’t yet know. In four days, I’ll go through the pile on the right. And any that I don’t remember then, I’ll move back to my pile on the left, to remind me to go through them each day. So the things that I remember least, are most often loaded in the short-term memory, solidifying them in my long-term memory. Now in practice, you wouldn’t just do this with two piles, you’d do a three, or four, or five. And the long restoration pile you’d might only go through yearly, just to see if it’s decayed yet.
Now let’s talk a little bit about cognition. One of the most important cognitive processes to consider is learning. When we design interfaces, we are in some ways hoping the user has to learn as little as possible to find the interface useful. However, our interfaces should also teach the user over time how to use them most efficiently. We can take PowerPoint as an example of this. Let’s pretend this is the first time I’ve ever used PowerPoint and I want to copy something. This application doesn’t assume I know anything yet. If I poke around, I’ll find the Copy button under the Edit menu which is off-screen above the slide. When I bring up that menu, it also shows the hotkey that will allow me to actually copy something. That’s helping me learn to interact with the application more efficiently through hotkeys instead of through menus. But yet it’s also not assuming that I already knew that because I was still able to find this under that menu. There are two kinds of learning we’re most interested in: procedural and declarative. Procedural is how to do something. That could be doing work on the computer, playing sports, playing a musical instrument. It’s a task in which you’re engaged or an activity you’re performing. It’s something that you do. Declarative learning is knowledge about something. It’s what you know in your head. It’s what you can answer when asked. So, if I asked you, what’s the hotkey for paste? I’m asking for a declarative knowledge. If I asked you, paste your clipboard here, I’m asking for procedural knowledge. If you’re an expert computer user, you’d probably find it easier to just actually paste your clipboard than to answer me when I say what’s the hotkey for paste. In all likelihood, when I ask you what’s the hotkey for paste, the way you remember it is you mentally simulate doing it and then you look at what you simulated yourself doing. What’s interesting is that while a declarative knowledge is how we generally communicate with one another, procedural knowledge is generally what we do in HCI. When you have strong procedural knowledge, you may forget how you’re doing what you’re doing because it’s so second nature. You’re unconsciously competent with what you’re doing. When you’re in that state, it can be difficult to explain to someone who lacks that competence because you aren’t sure what makes you so good at it. It’s difficult to translate your subconscious procedural knowledge into explicit declarative knowledge. But declarative knowledge is how we communicate. That’s how we communicate with novice users. This is important because as the designers of interfaces, we’re the experts in our domains. That means we’re prone to designing things that are easy for us to use but hard for anyone else.
To talk about cognitive load, let’s think for a moment of the brain like it’s a computer. The community is actually divided on whether or not the brain actually operates this way, but for the purposes of this explanation, it’s a useful metaphor. So your brain has a certain number of resources available to it, the same way your computer has a certain number of processor resources available to it. Each thing that the brain is working on takes up some of those resources. Let’s say your at home in a quiet area, working on a calculus problem that requires 60% of your cognitive resources. In that setting, you have plenty of resources to solve that problem. However, then you go to take a calculus test. Now you have some stress in there. Now you’re stressing about the impact this test is going to have on your grade. You’re stressing about how well other people seem to think they are doing on it. Whether or not other people seem to be struggling while you struggle. This is taking up a lot of your cognitive resources. Here we see the stress taking up 50% of the cognitive resources you have. Now you don’t have sufficient resources to complete the problem successfully. I hypothesize that’s why test taking anxiety can have such a negative effect. It takes resources away from actually working on the test. You can apply these same principles to the presence of distractions, anxiety disorders and more. Cognitive load has two major applications to our working design interfaces. One, we want to reduce the cognitive load posed by the interface, so that the user can focus on the task. Second, we want to understand the context of what else is going on while users are using our interface. We need to understand what else is competing for the cognitive resources users need to use our interface. If we’re designing a GPS or navigation system for example, we want to be aware that the user will have relatively few cognitive resources because they’re focusing on so many things at once.
Let’s take a second and reflect on cognitive load. Try to think of a task where you’ve encountered a high cognitive load. What different things did you have to keep in mind at the same time? And how could an interface have actually helped you with this problem?
Computer programming is one task with an incredibly high cognitive load. At any given time, you’re likely holding in working memory your goals for this line of code, your goals for this function, your goals for this portion of the program as a whole, the variables you’ve created and a lot more. That’s why there’s so many jokes about how bad it is to interrupt a programmer, because they have so much in working memory that they lose when they transition to another task. But there are ways good IDEs can help mitigate those issues. For example, inline automated error checking is one way to reduce the cognitive load on programmers, because it lets them focus more on what they’re trying to accomplish rather than the low level syntax mistakes. In that way, the IDE offloads some of the responsibility from the user to the interface. Now we could phrase that a little bit differently too. We could describe this as distributing the cognitive load more evenly between the different components of the system, myself and the computer. That’s a perspective we discuss when we talk about distributed cognition.
Here are five quick tips for reducing cognitive load in your interfaces. Number one: Use multiple modalities. Most often, that’s going to be both visual and verbal, but when only one system is engaged, it’s natural for it to become overloaded while the other one becomes board so describe things verbally and also present them visually. Number two: Let the modalities complement each other. Some people will take that first step and use it as justification to present different content in the two modalities. That actually increases cognitive load because the user has to try two process two things at once. As you just noticed, when Amanda put something irrelevant up while I said that. Instead, focus on letting each modality support, Illustrate, or explain the other instead of competing with the other. Number three: Give the user control of the pace. That’s more pertinent in educational applications of cognitive load, but oftentimes, interfaces have built-in timers on things like menus disappearing or selections needing to be made that dictates the pace induce a stress and raises cognitive load. Instead, let the user control the pace. Number four: Emphasize essential content and minimize clutter. The principle of discoverability says we want the user to be able to find the functions available to them. But that could also raise cognitive load if we just give users a list of 500 different options. To alleviate that we can design our interfaces in a way that emphasizes the most common actions while still giving access to the full range of possible options. Number five: Offload tasks. Look closely at what the user has to do or remember at every stage of the interfaces operation and ask if you can offload part of that task onto the interface. For example, if a user needs to remember something that they entered on a previous screen, show them what they entered. If there’s a task that need to do manually, see if you can trigger it automatically.
So our user has received some input. It’s entered her memory, she cognitively processed it. Now it’s time to act in the world in response. In designing interfaces, we’re also interested in what is physically possible for users to do. This includes things like, how fast they can move, or how precisely they can click or tap on something. For example, here are two versions of the Spotify control widget that appears on Android phones. On the left is the version that’s available in the tray of the phone that you can access at any given time by swiping down on the phone screen. And on the right is the version that appears on the lock screen when you turn on a locked phone while it’s playing music. In each case, the X closes the app, which is consistent with a lot of other applications. The forward, back and pause buttons are similarly consistent with their usual meanings. I don’t actually know what the plus sign here does. It’s doesn’t have a clear mapping to some underlying function. Now note on the left, we have the close button, in the top right corner. It’s far away from anything else in the widget. On the right, the close button is right beside the skip button. I can speak from considerable personal experience, and say that the level of specificity or the level of precision required to tap that X, instead of tapping the skip button, is pretty significant. Especially if you’re using this while running or driving, or anything besides just sitting there, interacting directly with your phone. The precision of the user’s ability to tap on a button is significantly reduced in those situations. And in this case, that can lead to the quick error of closing the application when all you’re trying to do is skip forward to the next song. This isn’t an error in the perception of the screen. It’s not an error in their memory of the controls. They’re not thinking that the X button actually is the skip button. This is just an error in what they’re physically able to perform at a given time. The interface relies on more precision than they would have in many circumstances. So this design doesn’t take into consideration the motor system of the user or the full context surrounding usage of this application. This isn’t as significant in the design on the left, because there’s more room around that close button. If I aim for the forward button and miss, the worst that’s going to happen is I might pause it. I’m not going to close it by accident. This is one example of how we need to be aware of the constraints on the user’s motor system. What they can physically do, how precise or accurate they can be, and so on. And we have to be aware of that in the context where the application is going to be used as well. These buttons are no smaller than the keys on a smart phone keyboard but we expect more specificity when they’re sitting there typing with their thumbs, as opposed to reaching over and interacting real quick on something on the lock screen. Now of course, there might be other constraints around this. There might be a reason why this button’s placed there. There might be some constraint in the Android system that doesn’t let them use more than one row of the lock screen. In that case, we would need to make our interface more tolerant of errors. Maybe require a double tap to close the app, or maybe we mute it when it’s pressed and then gives the user five seconds to confirm that that’s actually what they want to do. Those are ways of reducing the penalty for errors.
We’ve talked a lot about different human abilities in this lesson. Depending on the domain you chose, the human abilities in which you’re interested may vary dramatically. If you’re looking at gestural interfaces or wearable devices, then the limitations of the human motor system might be very important. On the other hand, if you’re interested in educational technology, you are likely more interested in some of the cognitive issues surrounding designing technology. For virtual reality, your main concern would likely be perception. Although, there are interesting open questions about how we physically interact with virtual reality as well. So, take a few moments and reflect on what the limitations of human ability are in the domain of HCI that you chose to explore.
Today we’ve gone through a crash course on human abilities and perception. We started off by talking about the main ways people perceive the world around them through sight, sound, and touch. Then we discussed some of the components of cognition especially memory, and learning. Then we discussed the motor system, how the person then interacts with the world around them. In this single lesson, we’ve only scratched the surface of human perception. There are entire courses, even entire degree programs that focus on these principles. We’ll give you some suggestions on some in the notes. So, don’t think we’ve given you a full view of the field. Instead, we hope we’ve given you just enough to start to keep human abilities in mind, and enough to know what to research as you start to learn to design interfaces.
[MUSIC] Over the many years of HCI development, experts have come up with a wide variety of principles and heuristics for designing good interfaces. None of these are hard and fast rules like the law of gravity or something. But they’re useful guidelines to keep in mind when designing our interfaces. Likely, the most popular and influential of these is Don Norman’s six principles of design. Larry Constantine and Lucy Lockwood have a a similar set of principles of user interface design, with some overlaps but also some distinctions. Jacob Nielsen has a set of Ten Heuristics for user interface design that can be used for both design and evaluation. And while those are all interested in general usability, there also exists a set of seven principles called Principles of Universal Design. These are similarly concerned with usability, but more specifically for the greatest number of people. Putting these four sets together, we’ll talk about 15 unique principles for interaction design.
In this lesson, we’re going to talk about four sets of design principles. These aren’t the only four sets but are the ones that I see referenced most often, and we’ll talk about what some of the others might be at the end of the lesson. In his book, “The Design of Everyday Things”, Don Norman outlined his famous six design principles. This is probably the most famous set of design principles out there. The more recent versions of the book actually have a seventh principle. But that seventh principle was actually one of our entire lessons. Jakob Nielsen outlines 10 design heuristics in his book, “Usability Inspection Methods”. Many of Norman’s principles are similar to Nielsen’s, but there’s some unique ones as well. What’s interesting is Norman and Nielsen went into business together and form the Nielsen Norman Group, which is for user experience, training, consulting in HCI research. In their books, “Software for Use”, Larry Constantine and Lucy Lockwood outline an additional six principles. Again, many of them overlap with these two, but some of them are unique. Finally, Ronald Mace of North Carolina State University proposed seven principles of universal design. The Center for Excellence in universal design, whose mobile site is presented here, has continued research into this area. These are a little bit different than the heuristics and principles presented in the other three. While these three are most concerned with usability in general, universal design is specifically concerned with designing interfaces and devices that can be used by everyone regardless of age, disability, and so on. To make this lesson a little easier to follow, I’ve tried to merge these four sets of principles into one larger set, capturing the overlap between many of them. In this lesson, we’ll go through these 15 principles. These principles are intended to distill out some of the overlap between those different sets. This table shows those 15 principles. My names for each of them, and which sets that come from. Note that my 15 principles are just an abstraction or summary of these sets of principles, and you should make sure to understand the sets themselves as well. There are some subtle differences between the principles I’ve grouped together from different sets, and we’ll talk about those as we go forward. Again, note that these aren’t the only four sets of design principles out there. At the end of the lesson, we’ll chat about a few more, and we’ll also mention when others apply within this lesson as well.
Our first principle is discoverability. Don Norman describes it by asking, is it possible to even figure out what actions are possible and where and how to perform them? Nielsen has a similar principle. He advises us to minimize the user’s memory load by making objects, actions, and options visible. Instructions for use of the system should be visible or easily retrievable whenever appropriate. In other words, when the user doesn’t know what to do, they should be able to easily figure out what to do. Constantine and Lockwood have a similar principle called the visibility principle. The design should make all needed options and materials for a given task visible without distracting the user with extraneous or redundant information. The idea behind all three of these principles is that relevant function should be made visible, so the user can discover them as opposed to having to read about them in some documentation or learn them through some tutorial. Let’s take an example of this real quick. Here in PowerPoint, there are a number of different menus available at the top, as well as some toolbars. The effect here is that I can browse the different functions available to me. I can discover what’s there. For Nielsen, this means that I don’t have to remember all of these. I just have to recognize them when I see them in the tool bars. For example, I don’t have to remember Arrange as some keyboard I have to type in manually to bring up some ideas about how I might arrange things. All I have to do is recognize Arrange as the right button when I see it. Now while this might be true at the application level, it’s not often true at the operating system level, because the operating system doesn’t command so much screen real estate all the time and probably for good reason. So for example, on a Mac, I can use Command Shift 4 to take a screen shot only of a certain area of my screen. However, the only way I know of to find that is to Google it or read it in a manual. It isn’t discoverable or visible on it’s own. And you might never even realize it’s possible. So the principle of discoverability advocates that functions be visible to the user, so that they can discover them, rather then relying on them learning them elsewhere. I actually use a PC more than a Mac. And whenever I come back to my Mac after not using it for awhile, I have to Google that again. I know it’s possible, but I never remember the command that actually makes it happen. Constantine and Lockwood’s principle of visibility would add on to this that we shouldn’t get too crazy. We want to make functions discoverable, but that doesn’t mean just throwing everything on the screen. We want to walk a line between discoverability, and simplicity
Discoverability is one of the challenges for designing gesture-based interfaces. To understand this, let’s watch Morgan do some ordinary actions with her phone. [MUSIC] [SOUND] [MUSIC] We just saw Morgan do four things with the phone. Reject a call, take a screenshot, take a selfie, and make a phone call. For each of those, this phone actually has a corresponding gesture that would have made it easier. She could have just turned the phone over to reject the call or said, shoot, to take the selfie. The problem is that these are not discoverable. Having a menu of voice commands kind of defeats the purpose of saving screen real estate and simplicity through gestures and voice commands. So, brainstorm a bit. How would you make these gesture commands more discoverable?
There’s a lot of ways we might do this, from giving her a tutorial in advance, to giving her some tutoring in context. For example, we might use the title bar of the phone to just briefly flash a message letting the user know when something they’ve done could have been triggered by a gesture or a voice command. That way, we’re delivering instruction in the context of the activity. We could also give a log of those so that they can check back at their convenience and see the tasks they could have performed in other ways.
There often exists a tension between discoverability and simplicity. On the one hand, discoverability means you need to be able to find things. But how can you find them if they’re not accessible or visible? That’s how you get interfaces like this with way too many things visible. And ironically as a result, it actually becomes harder to find what you’re looking for because there’s so many different things you have to look at. This is where the principle of simplicity comes in. Simplicity is part of three of our sets of principles, Nielsen’s, Constantine and Lockwood’s, and the universal design principles. Nielsen writes specifically about dialogues. He says that the dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information competes with the relevant units of information, and diminishes their relative visibility, which we see with all those toolbars Constantine and Lockwood establishes as their simplicity principle. They say the design should make simple common tasks easy. Communicating clearly and simply in the user’s own language and providing good shortcuts. Universal design is concerned with simplicity as well. Their principle of simple and intuitive use advocates the use of design easy to understand regardless of the user’s experience, knowledge, language skills, or current concentration level. And in this principle you can see universal design’s special concern with appealing to users of a variety of different levels of expertise, ages, disabilities, and so on. Now in some ways, these principles are about designing interfaces but they cover other elements as well. One example of this is the infamous blue screen of death from the Windows operating systems. On the left we have the blue screen of death as it appeared in older versions of Windows. And on the right we have how it appears now on Windows 10. There are a lot of changes here. The blue is softer and more appealing. The description of the error is in plain language. But the same information is still provided, it’s just de-emphasized. This is a nice application of Nielsen’s heuristic. The user should only be given as much information as they need. Here, the information that most users would need, which is just that a problem occurred and here’s how close I am to recovering from it, are presented more prominently than the detailed information that might only be useful to an expert. Another interesting application of simplicity in action came when New York tried to create simpler signs to represent its allowed parking schedule. Navigating the sign on the left is pretty much impossible. But it’s pretty easy to interpret the one on the right. The universal design principle of simplicity is particularly interested in whether or not people of different experiences, levels of knowledge, or languages can figure out what to do. Navigating this sign requires a lot of cognitive attention and some language skills. Whereas I would hypothesize that even someone who struggles with English might be able to make sense oft the sign on the right. These two signs communicate the same information, but while the one on the left requires a lot of cognitive load and language skills, the one on the right can probably be understood with little effort and little experience.
One way to keep design both simple and usable is to design interfaces that by their very design tell you how to use them. Don Norman described these as affordances. The design of the thing affords or hints at the way it’s supposed to be used. This is also similar to the familiarity principle from Dix et al. This is extremely common in the physical world because the physical design of objects is connected to the physical function that they serve. Buttons are meant to be pressed, handles are meant to be pulled, knobs are meant to be turned. You can simply look at it and understand how you’re supposed to use it. Our next principle is the principle of affordances. Norman writes, “An affordance is the relationship between the properties of an object and the capabilities of the agent that determine just how the object could possibly be used. In other words, an object with an affordance, basically tells the user by its very design, how it’s meant to be used.” I use a door handle as the icon for this, because the door handle is a great example of an affordance. You can look at it and understand that you’re probably supposed to pull it down or push it up. The very design of it, tells you how you’re supposed to use it. Norman goes on to say that, “The presence of an affordance is jointly determined by the qualities of the object and the abilities of the agent that is interacting”. In other words, an affordance for one person isn’t an affordance for everyone. If you didn’t grow up around door handles, then maybe that door handle doesn’t have an affordance for you the way it would for someone who grew up around that. Our affordances are defined by who the user is. The challenge is that in the virtual computer world, there’s no such inherent connection between the physical design and the function of an interface, the way you might often find in the real world. For example, when I mouse over a button in my interface, the style that appears around it makes it look like it’s elevated. It makes it look like it’s popping out of the interface. That affords the action of then pushing it down, and I know that I need to click it to push it down. When I click it, it depresses. It gets darker. It looks like it’s sunk into the interface. So, here we’ve manually created an affordance that would exist in the real world. The design of this button hints at how it’s supposed to be used. It hints at the fact that it’s supposed to be pressed. So, we have to create that naturalness manually. We can do that in a number of different ways. We could, for example, visualize the space of options. Here this color picture does a good job of this. The horizontal line, it actually shows us the list of options available to us. The placement of the dial suggest where we are now, and there’s this kind of implicit notion that I could drag the style around to change my color. We can also leverage metaphors or analogies of physical devices. You can imagine that as this content was presented like a book, I might scroll through it by flicking to the side, as if it’s a page. You may have seen interfaces that work exactly like that. There’s no computational reason why that should mean go to the next page or that should mean go back a page, except that it makes sense in the context of the physical interface it’s meant to mimic. We swipe in a book, so let’s swipe in a book like interface. Of course, there are also actions in the virtual world that have no real-world analogy, like pulling up a menu on a mobile site. In that case, we might use signifiers. Signifiers are a principle in Norman’s more recent additions. Signifiers are in-context instructions like arrows to indicate which way to swipe for a particular action, or in this case, a button labeled menu to indicate how to pull up a menu. In this way, we can kind of create our own affordances by creating an intuitive mapping between controls and their effects in the world being consistent with what others have done in the past.
It’s important to note that the language with which we talk about affordances is famously somewhat imprecise. Norman’s technical definitions of affordances are a little different than what we’ve used here. Affordances to Norman are actually inherent properties of the device. For example, a door bar like this one has the inherent property that the handle moves into the crossbar and opens a latch. That’s the affordance, it’s something that inherently does. A perceived affordance is a property attributed to the object by a human observer. This could be a subtle difference. Here, the perceived affordance would be pushability. Pushing though is a human behavior. So, pushability must be a perceived affordance because it relies on someone to do the pushing. What’s important is that a perceived affordance can actually be inaccurate. Norman famously complains when doors that are actually meant to be pushed have a handle like this, which looks like it’s supposed to be pulled. This is a place where a perceived affordance and an actual affordance are in conflict. The user perceives that the store is meant to be pulled, but the actual affordance is for it to be pushed, or to be more precise, the actual affordance is that the door opens inward based on these hinges. A signifier then is anything that helps the perceived affordance match the actual affordance. For example, some doors like this will have a block bar on the part that’s supposed to be pushed. That signifies to a user that this is a place they should test out the interaction that they perceive to be possible. On the store, we can put a sign that just says, Push. That would be a signifier that tries to alleviate the conflict between the actual affordance and the perceived affordance. Now based on these definitions, we can’t add affordances. Affordances are inherent in our system. Instead, we can add signifiers that help the perception of affordances match the actual affordances that are there. With these technical definitions of these terms, saying I added an affordance to the interface is like saying I added tastiness to that dish or I added beauty to that painting. Affordances, tastiness, beauty; these are all things that arise as a result of adding signifiers, or oregano, or some pretty shade of blue. But in practice, these distinctions around this vocabulary are often disobeyed. It’s not uncommon to hear people say, I added an affordance here, so the user knows what they’re supposed to do. To me, there’s really no harm in that. The distinctions between these terms are very important when developing a theory of HCI. But when you’re doing day-to-day design, we usually know what we’re talking about when we misuse these terms. So, I don’t think there’s any harm in being cavalier about how we use these terms, but it is important to know this distinction in case anyone ever brings up the difference.
Norman and Nielsen both talk about the need for a mapping between interfaces and their effects in the world. Norman notes that mapping is actually a technical term coming from mathematics that means a relationship between the elements of two sets of things. In this case, our two sets are the interface and the world. For example, these book icons might help you map these quotes to the books from which they were taken. Nielsen describes mapping by saying the system should speak the users’ language, with words, phrases, and concepts that are familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. A great example of this, is the fact that we call cut, copy, and paste, cut, copy and paste. Surely there could have been more technical terms like duplicate instead of copy. But using cut, copy, and paste forms a natural mapping between our own vocabulary and what happens in the system. Note that these two principles are subtly different, but they’re actually strongly related. Nielsen’s heuristic describes the general goal, while Norman’s principle describes one way to achieve it. Strong mappings help make information appear in natural and logical order. A great example of this is setting the arrangement of different monitors. This actually comes from my display in my home office. This visualization creates a very natural mapping between the way the system treats the monitors, and they way they’re actually laid out in the world. If there’s a mismatch, or if something doesn’t make sense, I can easily look at this and map it with the arrangement of the monitors in front of me and figure out what’s going on. This could instead be shown as just a list of pixel locations. And that would still present all the exact same information, but in a way that isn’t as easily mapped out to the real world. Now, mappings and affordances are similar principles, but they have a clear and important difference. We can see that difference in our color meter again. Affordances were about creating interfaces where their designs suggested how they’re supposed to be used. The placement of this notch along this horizontal bar, kind of affords the idea that it could be dragged around. The horizontal bar visualizes the space which makes it seem like we could move that notch around to set our color. However, that design on its own wouldn’t necessarily create a good mapping. Imagine, if instead of the bar fading from white to black, it was just white the entire way. It would still be very obvious how you’re supposed to use it. But it wouldn’t be obvious what the effect of using it would actually be. It’s the present of that fade from white to black that makes it easier to see what will happen if I actually drag that around. I can imagine it’s going to make the colors fade from white to black. That creates the mapping to the effect of dragging it around on that meter. So mapping refers to creating interfaces where the design makes it clear what the effect will be when using them, not just creating interfaces where it’s clear how you’re supposed to use them. With this color meter, the arrangement of the controls makes clear what to do and the visualization underneath, makes it clear what will happen when I do it.
A good example of the difference between affordances and mappings is a light switch. A light switch very clearly affords how you’re supposed to use it. You’re supposed to flip it. But these switches have no mapping to what will happen when I switch them. I can look at it and clearly see what I’m supposed to do. But I can’t tell what the effect is going to be in the real world. Contrast with the dials on my stove. There are four dials and each is augmented with this little icon that tells you which burner is controlled by that dial. So there’s a mapping between the controls and the effects. So how would you redesign these light switches to create not only affordances but also mappings. If relevant, this one turns on the breakfast room light, this one turns on the counter light and this one turns on the kitchen light.
There are a few things we could do actually. Maybe we could put a small letter next to each light switch that indicates which light in the room that switch controls. K for kitchen, C for counter top, B for breakfast room. Maybe we actually put icons that demonstrates which kind of light is controlled by each switch. So the counter top lights are kind of sconce lights, so maybe we put an icon that looks like the counter top lights. But likely the easiest thing is actually the way the system really was designed. I just didn’t notice it until I started writing this video. The lights from left to right in the room are actually controlled by the light switches from left to right on the wall. This switch controls the light over there. This switch controls the light right here. And this switch controls the light back there. That actually forms a pretty intuitive mapping.
Our next principle is perceptibility. Perceptibility refers to the user’s ability to perceive the state of the system. Nielsen states that the system should always keep users informed about what is going on, through appropriate feedback within reasonable time. That allows the user to perceive what’s going on inside the system. Universal design notes that the design should communicate necessary information effectively to the user, regardless of ambient conditions or the user’s sensory abilities. In other words, everyone using the interface should be able to perceive the current state. Note that this is also similar to Norman’s notion of feedback. He writes that feedback must be immediate, must be informative, and that poor feedback can be worse than no feedback at all. But feedback is so ubiquitous, so general, that really, feedback could be applied to any principle we talk about in this entire lesson. So instead we’re going to reserve this more narrow definition for when we talk about errors. And our lesson on feedback cycles covers the idea of feedback more generally. Things like light switches and oven dials, actually do this very nicely. I can look at a light switch and determine whether the system it controls is on or off, based on whether the switch is up or down. Same with the oven dial. I can immediately see where the dial is set. But there’s a common household control, that flagrantly violates this principle of perceptibility. Here’s our ceiling fan, you might have one just like it. It has two chains. One controls the light, one controls the fan speed. But both only when the switch on the wall is on. Now first, the mapping here is awful. There’s no indication which control is which. But worse, the fan chain, which is this one, doesn’t give any indication of which setting the fan is on currently. I don’t honestly even know how many settings it has. [SOUND] I don’t know if pulling it makes it go up and then down, up and then off, down and then off. Whenever I use it, I just pull it, wait ten seconds and see if I like the speed, and then pull it again. And this is all only if the wall switch is on. Now, of course people have resolved this with dials or other controls, and yet these dang chains still seem to be the most common approach despite this challenge of perceptibility.
Consistency is a principle from Norman, Nielsen, and Constantine and Lockwood that refers to using controls, using visualizations, using layouts, using anything we use in our interface design consistently, across both the interfaces that we design and what we design more broadly as a community. Norman writes a consistency in design is virtuous. It’s a powerful word there. It means that lessons learned with one system transfer readily to others. If a new way of doing things is only slightly better than the old, it’s better to be consistent. Of course, there’ll be times when new ways of doing things will be significantly better than the old, and that’s how we actually make progress. It’s how we advance. But if we’re only making tiny little iterative improvements, it might be better to stick with the old way of doing things because users are used to it. They’re able to do it more efficiently. Nielsen writes that users should not have to wonder whether different words, situations or actions mean the same thing. Follow platform conventions. In other words, be consistent with what other people have done on the same platform, in the same domain, and so on. Constantine and Lockwood describe consistency as reuse. They say the design should reuse internal and external components and behaviors, maintaining consistency with purpose rather than merely arbitrary consistency. Thus reducing the need for users to rethink and remember. That means that we don’t have to be consistent with things that don’t really impact what the user knows to do. The color of the window, for example, isn’t going to change whether the user understand what the word copy means in the context of an interface. But changing the word copy to duplicate might force users to actually rethink and remember what that term means. In some cases, that might be a good thing if duplicate actually does something slightly different than copy, then changing that would force our users to understand that they’re doing something different. But if we’re doing the same thing, it’s important to maintain consistency so the user doesn’t have to think as much and can focus on the task at hand instead of on our interface. The general idea across all of these is we should be consistent both within and across interfaces to minimize the amount of learning the user needs to do to learn our interface. In this way, we create affordances on our own. Unlike traditional physical affordances, there’s no physical reason for the interface to be designed a certain way. But by convention, we create expectations for users and then fulfill those expectations consistently. One great example of following these conventions are the links we use in text on most websites. For whatever reason, an early convention on the Internet was for links to be blue and underlined. Now when we want to indicate to users that some text is clickable, what do we do? Generally, we might make it blue and underline it. Sometimes we change this as you can see here. Underlining has actually fallen out of fashion in a lot of places and now we just use the distinct text color to indicate a link that can be clicked. On some other sites, the color itself might be different. It might be red against blue text instead of blue against black. But the convention of using a contrasting color to mark links has remained, and the most fundamental convention is still blue underlines. Again there’s no physical reason why links need to be blue or why they even need to be a different text color at all. But that convention helps users understand how to use our interfaces. If you’ve used the Internet before and then visit Wikipedia for the first time, you understand that these are links without even thinking about it. Most of the interfaces we designed will have a number of functions in common with other interfaces. So, by leveraging the way things have been done in the past, we can help users understand our interfaces more quickly. Other common examples of consistency and interface design would include things like using consistent hotkeys for things like copy, paste and select all. Ordering the Menus, File, Edit, View, etc, putting options like save and open under File. We don’t even begin to think about these things when we’re using an interface until we encounter one that defies our conventions, and yet someone has to consciously decide to be consistent with established norms. This is an example of design becoming invisible. When people do it right, we don’t notice they did it at all. When people put those options in the right places, we don’t even think about it, but when people put them in the wrong places, it’s pretty jarring and startling.
One of my favorite examples of how consistency matters comes from Microsoft’s Visual Studio development environment. To be clear, I adore Visual Studio, so I’m not just piling onto it. As you can see here, in most interfaces, Ctrl+Y is the redo hotkey. If you hit undo one too many times, you can press Ctrl+Y to redo the last undone action. But in Visual Studio, by default it’s Shift+ Alt + Backspace. What? And what’s worse than this is that Ctrl+Y is the delete line function, which is a function I never even heard of before Visual Studio. So, if you’re pressing Ctrl+Z a bunch of times to maybe rewind the changes you’ve made lately, and then you press Ctrl+Y out of habit because it’s what every other interface uses for redo, the effect is that you delete the current line instead of redoing anything, and that actually makes a new change which means you lose that entire tree of redoable actions. Anything you’ve undone can now not be recovered. It’s infuriating and yet it isn’t without its reasons and the reason is consistency. Ctrl+Y was the hotkey for the delete line function in WordStar, one of the very first word processors. Before Ctrl+Y was the hotkey for the more general redo function, there wasn’t even a redo function back then. I’ve heard that Y in this context stood for yank but I don’t know how true that is. But Ctrl+Y had been used to delete a line from WordStar all the way through Visual Basic Six, which was the predecessor to Visual Studio. So, in designing Visual Studio, Microsoft had a choice, be consistent with the convention from Word Star and Visual Basic Six or be consistent with the convention that we’re using in their other interfaces. They chose to be consistent with the predecessors to Visual Studio, and they’ve stayed consistent with that ever since. So, in trying to maintain the consistency principle in one way, they actually violated it in another way. So, if you try to leverage the consistency principle, you’re going to encounter some challenges. There may be multiple conflicting things with which you want to be consistent, there may be questions about whether a certain change is worth the dropping consistency. These are things to test with users which we talked about in the other unit of this course.
Depending on your expertise with the computers, there’s a strong chance you found yourself on one side or the other of the following exchange. Imagine, one person is watching another person use a computer. The person using the computer repeatedly right-clicks and selects cut to cut things, and then right-click, and selects Paste to paste them back again. The person watching insists that they can just use Control X and Control V. The person working doesn’t understand why the person watching cares. The person watching doesn’t understand why the person working what use the more efficient method. In reality, they’re both right. This is the principle of flexibility. These two options are available because of the principle of flexibility, from both Nielsen’s heuristics and the principles of universal design. Nielsen specifically comments on the use of accelerators, which are hotkeys. He says that accelerators may often speed up the interaction for the expert user, such that the system can cater to both inexperienced and experienced users. He advises that we allow users to tailor frequent actions. Universal design says something similar. They noted that the design should accommodate a wide range of individual preferences and abilities. Another set of design principles from Dicks et al also have a category of principles called Flexibility Principles, that advocate user customizability in supporting multiple designs for the same task. Here, Nielsen is most interested in catering to both novice and expert users. While the principles of universal design are more interested in accommodating users of various abilities and preferences. But the underlying principle here is the same, flexibility. Wherever possible, we should support the different interactions in which people engage naturally, rather than forcing them into one against their expertise or against their preference.
The principle of flexibility in some ways appears to clash with the principle of equity. But both come from the principles of universal design. The principle of flexibility said the design should accommodate a wide range of individual preferences and abilities. But the principle of equity says the design is useful and marketable to people with diverse abilities, and it goes on to say we should provide the same means for all users, identical whenever possible and equivalent when not. And we should avoid segregating or stigmatizing any users. Now, in some ways, these systems might compete. This says we should allow every user to use the system the same way, whereas this one says that we should allow different, flexible methods of interacting with the system. In reality, though, these are actually complementary of one another. Equity is largely about helping all users have the same user experience, while flexibility might be a means to achieve that. For example, if we want all our users to enjoy using our interface, keeping things discoverable for novice users and efficient for expert users allows us to accommodate a wide range of individual preferences and abilities. User experience in this instance means treating every user like they’re within the target audience and extending the same benefits to all users, including things like privacy and security. We might do that in different ways, but the important note is that the experience is the same across all users. That’s what equity is about. One good example of equity in action are the requirements for password resets. We want to design a system so that both expert and novice users experience the same level of security. Security is part of the user experience. Now, experts, we would assume, understand the value of a complex password. Novices might not. So if we don’t have requirements around passwords, novices might not experience the same level of security as experts. So password requirements can be seen as a way of making sure the user experience across novices and experts is the same with regard to security. In the process, we might actually frustrate novice users a little bit. You could actually see this as a violation of the flexibility principle, that we’re not flexibly accommodating in the kind of interaction that novices want to have. But the important thing, is we’re extending the same security benefits to everyone, and that’s equitable treatment. And that’s also a good example of how at times, the different design principles will appear to compete, and you have to decide what the best approach is going forward.
Ease and comfort are two similar ideas that come from the principles of universal design. And they also relate to equitable treatment, specifically in terms of physical interaction. The ease principle, which interestingly uses the word comfort, says the design can be used efficiently and comfortably and with a minimum amount of fatigue. The comfort principle notes that appropriate size and space is provided for approach, reach, manipulation and use regardless of the user’s body size, posture or mobility. Now, in the past, these principles didn’t have an enormous amount of application to HCI. Because we generally assume that the user was sitting at their desk with a keyboard and a monitor. But as more and more interfaces are becoming equipped with computers, we’ll find HCI dealing with these issues more and more. For example, the seat control in your car might now actually be run be a computer that remembers your settings and restores them when you get back in the car. That’s an instance of HCI trying to improve user ease and comfort in a physical area. Mobile interfaces are great examples of this as well. When deciding the size of buttons on a mobile interface, we should take into consideration that some users might have tremors that make it more difficult to interact precisely with different buttons. As we get into areas like wearable computing and virtual reality, these issues of ease and comfort are going to become more and more pertinent.
The structure principle is concerned with the overall architecture of a user interface. In many ways, it’s more closely related to the narrower field of user interface design than HCI more generally. It comes from Constantine and Lockwood, and they define it as their structure principle, which says that, “Design should organize a user-interface purposefully, in meaningful and useful ways, based on clear, consistent models that are apparent and recognizable to users, putting related things together and separating unrelated things, differentiating dissimilar things and making similar things resemble one another.” It’s a long sentence, but what it really says is we should organize our user interfaces in ways that helps the user’s mental model match the actual content of the task. What’s interesting to me about the structure principle is that it borrows from a form of UI design that predates computers entirely. We find many of the principles we learned in designing newspapers and textbooks apply nicely to user interfaces as well. For example, this is the Wall Street Journal print edition from several years ago, and here’s the Wall Street Journal website. Notice that many of the structural principles present in the print version are present in the website as well. Now, part of that is for brand consistency, but part of it is because the very same ideas we used in developing magazines and newspapers still apply to the development of websites. Lines and spacing still separate different categories of articles, headlines are still in bold, while the article text is smaller. Now, there are, of course, differences because the website can, for example, link to articles while physical papers cannot, which is why these are all shorter than the articles in the actual paper are. But we see a lot of the same principles at work in the website that were at work in the physical layout. Those are largely parts of structure. Organizing things in intuitive ways that group together similar parts, separate dissimilar parts, and help the user navigate what they’re consuming.
In designing user interfaces, our goal is typically to make the interface usable, and a big part of usability is accounting for user error. Many design theorists argue that there’s actually no such thing as user error. If the user commits an error, it was because the system was not structured in a way to prevent or recover from it, and I happen to agree with that. Now, one way we can avoid error is by preventing the user from performing erroneously in the first place. This is the idea of constraints. Constraining the user to only perform the correct actions in the first place. On constraints, Norman writes that constraints are powerful clues limiting the set of possible actions. The thoughtful use of constraints in design lets people readily determine the proper course of action, even in a novel situation. Remember, designing so that users are immediately comfortable in novel situations is one of the goals of good user interface design. Nielsen notes that even better than good error messages is a careful design which prevents a problem from occurring in the first place. Either eliminate error-prone conditions or check for them and present users with a confirmation option before they commit to the action. For example, if our application was prone to users accidentally closing it when they don’t mean to, ask them when it’s about to close if that’s actually what they meant to do. Both of these approaches refer to the need to stop faulty user input before it’s even received. This is a principle you might already encountered a lot. Our password reset screen actually does this pretty well. First, it shows us the constraints under which we’re operating right there visibly on the screen so we’re not left guessing as to what we’re supposed to be doing. Then as we start to interact, it tells us if we’re violating any of those constraints. So, if I were to just try to make my password the incredibly common 1234, it immediately tells me that my password isn’t long enough and it doesn’t represent enough character classes. Now, obviously, it can’t prevent me from entering 1234 in the first place, since maybe that’s along the way to a longer more valid password. But it’s visualizing those constraints so that instead of submitting and getting frustrated when it didn’t tell me I was doing it wrong until I’d actually done it, it actually tells me right in the context of doing it I’m not on the right track. This is kind of a soft constraint. It doesn’t prevent me from doing something, but it tells me while I’m doing it that I’m doing it wrong. A harder constraint goes along with that last bullet: can only contain characters printed on the computer’s keyboard. Right now, I’m trying to paste in a character that isn’t on the computer keyboard, and it’s just not showing it all together. It’s a hard constraint against the inputting characters that aren’t allowed. So, it’s preventing me from putting an invalid input in the first place. So, in their simplest form, constraints can be described as preventing the user from putting an input that wasn’t going to work anyway.
Norman takes us a step further though, when he breaks down constraints into four sub-categories. These aren’t just about preventing wrong input. They’re also about insuring correct input. They’re about making sure the user knows what to do next. Physical constraints are those that are literally physically prevent you from performing the wrong action. A three-prong plug, for example, can only physically be inserted in one way, which prevents mistakes. USB sticks can only be physically inserted one way all the way. But the constraint doesn’t arise until you’ve already tried to do it incorrectly. You can look at a wall outlet and understand if you’re trying to put it incorrectly. But it’s harder to look at a USB and know whether you’re trying to insert it the right way. A second kind is a cultural constraint. These are those rules that are generally followed by different societies, like facing forward on escalators, or forming a line while waiting. In designing we might rely on these, but we should be careful of intercultural differences. A third kind of constraint is a semantic constraint. Those are constraints that are inherent to the meaning of a situation. They’re similar to affordances in that regard. For example, the purpose of a rear view mirror is to see behind you. So therefore, the mirror must reflect from behind, it’s inherent to the idea of a rear view mirror, that it should reflect in a certain way. In the future that meaning might change, autonomous vehicles might not need mirrors for passengers, so the semantic constraints of today, might be gone tomorrow. And finally the fourth kind of constraint is a logical constraint. Logical constraints are things that are self-evident based on a situation, not just based on the design of something like a semantic constraint, but based on the situation at hand. For example, imagine building some furniture. When you reach the end, there’s only one hole left, and only one screw. Logically, the one screw left is constrained to go in the one remaining hole. That’s a logical constraint.
A lot of the principles we talk about are cases where you might never even notice if they’ve been done well. There are principles of invisible design, where succeeding allows the user to focus on the underlying tasks. But constraints are different. Constraints actively stand in the user’s way and that means they’ve become more visible. That’s often a bad thing, but in the case of constraints it serves the greater good. Constraints might prevent users from entering invalid input or force users to adopt certain safeguards. So of all the principles we’ve discussed, this might be the one you’ve noticed. So take a second, and think. Can you think of any times you’ve encountered interfaces that had constraints in them?
I have kind of an interesting example of this. I can’t demonstrate it well because the car has to be in motion, but on my Leaf there’s an option screen, and it lets you change the time and the date, and some other options on the car. And you can use that option screen until the car starts moving. But at that point, the menu blocks you from using it, saying you can only use it when the car is at rest. That’s for safety reasons. They don’t want people fiddling with the option screen while driving. What makes it interesting, though, is it’s a constraint that isn’t in the service of usability, it’s in the service of safety. The car is made less usable to make it more safe.
We can’t constrain away all errors all the time though. So, there are two principles for how we deal with errors that do occur, feedback and tolerance. Tolerance means that users shouldn’t be at risk of causing too much trouble accidentally. For this Nielsen writes that, “Users often choose system functions by mistake, and will need a clearly marked ‘emergency exit’ to leave the unwanted state without having to go through an extended dialogue. Support, undo and redo.” For Constantine and Lockwood, this is the tolerance principle. They write, “The design should be flexible and tolerant, reducing the cost of mistakes and misuse by allowing undoing and redoing, while also preventing errors wherever possible.” It should be becoming clear why that Control-Y issue with Visual Studio was so significant. Undo and redo are fundamental concepts of tolerance, and that Control-Y issue where Control-Y removes the line in Visual Studio gets in the way of redo allowing us to recover from mistakes. For Dix et al., this is the principle of recoverability. Now, Nielsen’s definition is most interested in supporting user experimentation. The system should tolerate users poking around with things. Universal design simply says, “The design minimizes hazards and the adverse consequences of accidental or unintended actions.” Dix et al. also refers to this as the principle of recoverability. Now, Nielsen’s definition is most interested in supporting user experimentation. The system should tolerate users poking around with things. That actually enhances the principle of discoverability. Because if the user feels safe experimenting with things, they’re more likely to discover what’s available to them. The principles from Constantine and Lockwood, and the principles of universal design are more about recovering from traditional mistakes. Jef Raskin poses this as a more humorous law of interface design, “A computer shall not harm your work or through inactivity, allow your work to come to harm.” So, we first have to make sure that the system prevents the user from doing too much damage accidentally. Either by constraining them away from making those mistakes, or allowing an easy way to recover once those mistakes have been made.
Second, the system should give plenty of feedback so that the user can understand why the error happened and how to avoid it in the future. Norman writes that feedback must be immediate and it must be informative. Poor feedback can be worse than no feedback at all. Because it’s distracting, uninformative, and, in many cases, irritating and anxiety-provoking. If anything has ever described the classic Windows Blue Screen of Death, it’s this. It’s terrifying. It’s bold. It’s cryptic. And it scares you more than it informs you. Nielsen writes that error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution. Note this tight relationship with recoverability. Not only should it be possible to recover from an error, the system should tell you exactly how to recover from an error. That’s feedback in response to errors. For Constantine and Lockwood, this is the feedback principle. The design should keep users informed of actions or interpretations, changes of state or condition, and errors or exceptions… through clear, concise, and unambiguous language familiar to users. Again, the old Windows blue screen of death doesn’t do this very well. Because the language is not familiar, it’s not concise, and it doesn’t actually tell you what the state or condition is. The new one does a much better job of this. Notice as well that Norman, Constantine, and Lockwood are interested in feedback more generally, not just in response to errors. That’s so fundamental that we have an entire lesson on feedback cycles that really is more emblematic of the overall principle of feedback. Here we’re most interested in feedback in response to errors, which is a very important concept on its own
Finally, Nielsen has one last heuristic regarding user error, documentation. I put this last for a reason, one goal of usable design is to avoid the need for documentation altogether. We want users to just interact naturally with our interfaces. In modern design, we probably can’t rely on users reading our documentation at all unless they’re being required to use our interface altogether. And Nielsen generally agrees. He writes that even though it’s better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on user’s task, list concrete steps to be carried out, and not be too large. I feel modern design as a whole has made great strides in this direction over the past several years. Nowadays, most often, when you use documentation online or wherever you might find it, it’s framed in terms of tasks. You input what you want to do, and it gives you a concrete list of steps to actually carry it out. That’s a refreshing change compared to older documentation, which was more dedicated to just listing out everything a given interface could do without any consideration to what you were actually trying to do.
We’ve talked about a bunch of different design principles in this lesson. How these design principles apply to your design tasks will differ significantly based on what area you’re working in. In gestural interfaces, for example, constraints presented a big challenge because we can’t physically constrain our users’ movement, we have to give them feedback or feedforward in different ways. If we’re working in particularly complex domains, we have to think hard about what simplicity means. If the underlying task is complex, how simple can and should the interface actually be. We might find ourselves in domains with enormous concerns regarding universal design. If you create something that a person with a disability can’t use, you risk big problems both ethically and legally. So, take a few moments and reflect on how these design principles apply to the area of HCI that you’ve chosen to investigate.
So, I’ve attempted to distill the 29 combined principles from Norman, Nielsen, Constantine, Lockwood, and the Center for Universal Design into just these 15. Here you can see where each of these principles comes from. I do recommend reading the original four lists to pick up on some of the more subtle differences between these principles that I’ve grouped together, especially perceptibility, tolerance, and feedback. Note also that in more recent editions, Norman has one more principle: conceptual models. That’s actually the subject of an entire lesson in this course. These also certainly aren’t the only four sets of design principles. There are several more. For example, Dix, Finlay, Abowd, and Beale proposed three categories of principles: learnability for how easily a new user can grasp an interface, flexibility for how many ways an interface can be used, and robustness for how well an interface gives feedback and recovers from errors. We talk about their learnability principles when we discussed mental models. Jill Gerhardt-Powals has a list of principles for cognitive engineering, aimed especially at reducing cognitive load. Her list is in particularly useful applications for data processing and visualization. In “The Human Interface”, Jef Raskin outline some additional revolutionary design rules. I wouldn’t necessarily advocate following them, but they’re interesting to see a very different approach to things. In “Computer Graphics Principles and Practice”, Jim Foley and others give some principles that apply specifically to 2D and 3D computer graphics. Finally, Susan Weinschenk and Dean Barker have a set of guidelines that provide an even more holistic view of interface design, including things like linguistic and cultural sensitivity, tempo and pace, and domain clarity. Even these are only some of the additional lists. There are many more that I encourage you to look into. We’ll provide some on the notes.
In this lesson, I’ve tried to take the various different lists of usability guidelines from different sources and distill them down to a list you can really work with. We combine the lists from Don Norman, from Jakob Nielsen, Larry Constantine, Lucy Lockwood, and the institute for Universal Design into 15 principles. Now remember, these are just guidelines, principles, and heuristics. None of them are unbreakable rules. You’ll often find yourself wrestling with the tensions between multiple principles. There will be something cool you’ll want to implement but only most expert users will be able to understand it, or there might be some new interaction method you want to test but you aren’t sure how to make it visible or learnable to the user. These principles are things you should think about when designing. But they only get you so far. You still do need finding, prototyping, and evaluation to find out what actually works in reality.
[MUSIC] Today we’re going to talk about mental models and representations. A mental model is the understanding you hold in your head about the world around you. Simulating a mental model allows you to make predictions and figure out how to achieve your goals out in the real world. A good interface will give the user a good mental model of the system that it presents. In order to develop good medal models we need to give users good representations of the system with which they’re interacting. In that way, we can help users learn how to use our interfaces as quickly as possible. So that’s what we’ll talk about in this lesson, creating representations that help users develop accurate mental models of our systems. We’ll start by talking about mental models in general and how they apply to the interfaces with which we’re familiar. Then we’ll talk about how representations can make problem solving easier or harder. After that, we’ll talk about how metaphors and analogies can be useful tools to create good representations that lead to accurate mental models. Then we’ll discuss how user error can arise either from inaccuracies or mistakes in the user’s mental model, or just from accidental slips, despite an accurate mental model. Finally, we’ll close by discussing learned helplessness, one of the repercussions of poor interface design, as well as expert blindspot, which is one of the reasons why poor design can occur.
A mental model is a person’s understanding of the way something in the real world works. It’s an understanding of the processes, relationships and connections in real systems. Using mental models we generate expectations or predictions about the world and then we check whether the actual outcomes match our mental model. So I’m holding this basketball because generally, we all probably have a model of what will happen if I try to bounce this ball. [SOUND] It comes back up. [SOUND] You didn’t have to see it come up to know what would happen. You use your mental model of the world to simulate the event. And then you use that mental simulation to make predictions. When reality doesn’t match with our mental model, it makes us uncomfortable. We want to know why our mental model was wrong. Maybe it makes us curious. But when it happens over and over, it can frustrate us. It can make us feel that we just don’t and never will understand. As interface designers, this presents us with a lot of challenges. We want to make sure that the users mental model in our systems matches the way our systems actually work. We can do that in two primary ways, one by designing systems that act the way people already expect them to act. And two, by designing systems that, by their very nature, teach people how they’ll act. That way we can minimize the discomfort that comes from systems acting ways that users don’t expect.
Mental models are not a uniquely HCI principle. In fact, if you search for mental models online, you’ll probably find just as much about them in the context of education, as the context of HCI. And that’s actually a very useful analogy to keep in mind. When you’re designing an interfce you’re playing, very much, the role of an educator. Your goal is to teach your user how the system works through the design of your interface. But unlike a teacher, you don’t generally have the benefit of being able to stand here and explain things directly to your user. Most users don’t watch tutorials or read documentation or if they do, they don’t want to. You have to design interfaces that teach users while they’re using them. That’s where representations come in. Good representations show the user exactly how the system actually works. It’s an enormous challenge, but it’s also incredibly satisfying when you do it well.
So let’s talk a little bit about mental models in the context of the climate control systems we see on automobiles. So this is my old car, it’s a 1989 Volvo. It, sadly, does not run anymore. But let’s talk about how the climate control system would work back when it did run. So, it’s a hot day outside. Looking at these controls, how would you make the air temperature colder and the air come out faster? The natural thought to me would be to turn the fan speed up over on the right, and the air temperature to the blue side over on the left. But this doesn’t actually make the temperature any colder, it just disables the heat. This dial over here in the top right, has to be turned on to make the air conditioning actually work. So just turning this thing over to the blue side doesn’t actually turn on the air conditioning. So to make it colder, you have to both slide this lever over to the left, and turn this dial to the red area. It’s kind of hard to see in this, but this little area over here on the left side of this dial is actually red. The red area on the air conditioning control designates the maximum coldness. What? This also means you can turn on both the heat and the air conditioning at the same time and have neither of them blowing out if your fan is turned off. None of these really match my mental model of how this system works and the colors used here do nothing to correct my mental model. There’s a control here that if you turn it to blue doesn’t make the car any colder. And if you turn this other control to red, it does make the car colder. What? So back in 1989, there was a lot of room for improvement on designing the climate control system for a car like this. Let’s see if we actually did improve that by talking about my new car, which is a 2015 Nissan Leaf. So in the 26 years since my old Volvo came out, have we gotten better at this? Well yeah, we’ve gotten a good bit better, although there’s still a good bit of room for improvement. So, here’s the climate control system from my Leaf. I have one dial, that turns the fan speed up and down. One dial that turns the temperature of the air coming out up and down. And so as far as that’s concerned, it’s pretty simple. But this interface still has some things that are pretty confusing. So for example, it has an automatic mode, where it tries to adjust the air temperature and the fan speed to bring the temperature of the car to the temperature that I want. So I press auto. Now it’s going to change the fan speed, and change the air temperature, if I didn’t already have it at the lowest, to try and get the car cooler faster. The problem is that I want to turn auto off, I don’t actually know how to do it. Pressing auto doesn’t actually turn it off. If I turn it so that it doesn’t only circulate air inside the car, then it turns auto off. But that might not be what I wanted. Maybe I wanted it to go to a certain air temperature without just circulating the air in the car. I turn auto back on, it’s going to turn that back on. Don’t know why. So right now, as far as I know, the only way to turn auto off is to turn the circulation mode off. It also lets me turn AC and heat on at the same time, which I don’t understand at all. Why would I ever need that? So there are some things that the system really should do better, or some things that it should constrain so the user doesn’t do things that don’t make sense in the context of wanting to set the temperature.
Matching our interface design to users’ mental models is a valuable way to create interfaces that are easily learnable by users. Here are five tips or, in this case, principles to leverage for creating learnable interfaces. These principles of learnability were proposed by Dix, Finlay, Abowd, and Beale in their book, Human-Computer Interaction. Number one, predictability. Look at an action. Can the user predict what will happen? For example, graying out a button is a good way to help the user predict that clicking that button will do nothing. Number two, synthesizability. Not only should the user be able to predict the effects of an action before they perform it, they should also be able to see the sequence of actions that led to their current state. That can be difficult in graphical user interfaces but something like the log of actions that they can see in the undo menu can make it easier. Command line interfaces are actually really good at this. They give a log of the commands that have been given in order. Number three, familiarity. This is similar to Norman’s principle of affordances. The interface should leverage actions with which the user is already familiar from real-world experience. For example, if you’re trying to indicate something is either good or bad, you’d likely want to use red and green instead of blue and yellow. Number four, generalizability. Similar to familiarity in the Norman’s principles of consistency, knowledge of one user interface should generalize to others. If your interfaces tasks that are similar to other interfaces tasks, like saving and copying and pasting, it should perform those tasks in the same way. Number five, consistency. This is slightly different than Norman’s principle of consistency. This means it’s similar tasks or operations within a single interface should behave the same way. For example, you wouldn’t want to have ctrl x cut some texts if texts are selected, but close the application if there is no text selected. The behavior of that action should be consistent across the interface. Using these principles can help the user leverage their existing mental models of other designs as well as develop a mental model of your interface as quickly as possible.
The most powerful tool in our arsenal to help ensure users have effective mental models of our systems is representation. We get to choose how things are visualized to users, and so we get to choose some of how their mental model develops. Using good representations can make all the difference between effective and ineffective mental models. So to take an example, let’s look at some instructions for assembling things. So here are the instructions for a cat tree that I recently put together. At first, I actually thought this was a pretty good representation. You can kind of see how things fit together from the bottom to the top. The problem is that this representation doesn’t actually map the physical construction of the cat tree itself. You can even see some bizarre mismatches even within this representation. Up here, it looks like this pillar is in the front, but the screw hole that goes into it is actually shown in the back. But we’re not looking up into the bottom of that piece, because in the middle piece, they actually map up pretty well. At least with the way they’re shown here. Again, that isn’t the way the actual piece works. So anyway, the point is, this is a poor representation for the way this furniture actually worked, because it wasn’t a real mapping between this representation and the real pieces. Lately I also put together some office furniture, and that actually had a very good representation. These are two of the steps from a hutch I put together to go over my desk. For the piece on the left, there was a perfect mapping between the way this piece worked and the way the screw holes were actually aligned on the piece. One clever thing they did is they actually showed this little screw hole right here that isn’t used for this step. That helped me understand the mapping between this piece and my piece. And understand that when I saw that screw hole that didn’t have a screw for it, that was okay. It would be natural to think we only need to show what users actually need to do. But including that screw hole helps users understand the mapping between this representation and the actual piece. This more complicated step over on the right actually ended up being pretty intuitive as well. Although it’s small and the details hard to see, the arrows they put along here made it pretty easy to see the direction you had to move things in order to put these pieces together. That’s especially useful in places like up here, where the screw hole actually isn’t visible in this diagram. But I can see that there is a screw hole here because of the way they represented this screw going into that hole. I can look at the arrangement of these pieces and get a good feel for how the pieces are meant to fit together. So this representation helps me understand the problem in a way that the other representation did not.
A good representation for our problem will make the solution self-evident. Let’s take a classic example of this. A hiker starts climbing a mountain at 7 AM. He arrives at the cabin on top at 7 PM. The next day, he leaves the cabin at 7 AM and arrives at the bottom at 7 PM. The question, was the hiker ever at the same point at the same time on both days?
Let’s watch that animation again. The hiker goes up the hill on one day, stays the night. And then goes back down the hill the next day. And we want to know, was the hiker ever at the same point at the same time on both days? And the answer is yes. Describe the way we describe it right here, it might actually seem odd that there is a point where the hiker is in the same place at the same time on both days. That seems like a strange coincidence, but what if we tweak the representation a little bit? Instead of one hiker going up and then coming down the next day. Let’s visualize the two days at the same time. If we represent the problem like this, we’ll quickly see the hiker has to pass himself. To show it again, we know the answer is yes, because there’s a time when the hiker would have passed himself if he was going in both directions on the same day. And to pass himself, he has to be in the same point at the same time. That representation took a hard problem and made it very easy.
For simple problems, identifying a good representation can be easy. But what about for more complex problems? For those problems, we might need some examples of what makes a good representation. So let’s try a complex example. We’ll use a problem with which you might be familiar. For now, I’ll call it the circles and squares problem. On one side of a table, I have three circles and three squares. My goal is to move the three circles and three squares to the other side of the table. I can only move two shapes at at time, and the direction of the moves must alternate, starting with a move to the right. The number of squares on either side can never outnumber the number of circles unless there are no circles at all on that side. How many moves does it take to accomplish this? Try it out and enter the number of moves it takes in the box. Or just skip if you give up.
If you solved it, well done on figuring it out despite such a terrible representation. Or congratulations on recognizing by analogy, that it’s the same as a problem you’ve seen in another class. If you skipped it, I don’t blame you. It’s not an easy problem. But it’s even harder when the representation is so poor. There were lots of weaknesses in that representation. Let’s step through how we would improve it. The first thing we could do is simply write the problem out. Audio is a poor representation of complex problems. So here’s a written representation of the problem. If the trouble you were having solving the exercise was just remembering all the rules, having this written down would be a huge help. But we can still do a lot better than this. Instead, we can represent the problem visually. Here we have the shapes, the three circles and the three squares. And we can imagine actually moving them back and forth. That arrow in the center, represents the direction of the next move. But we can still do better. Right now we have to work to compare the number of squares and circles, so let’s line them up. This makes it very easy to compare and make sure that the circles always outnumber the squares. And we can still do the same manipulation, moving them back and forth. Now the only remaining problem is that we have to keep in working memory, the rule that squares may not outnumber circles. There is no natural reason why need more squares than circles, it’s just kind of an arbitrary rule. So let’s make it more self evident. Let’s make the squares wolves, and the circles sheep. As long as the sheep outnumber the wolves, the sheep can defend themselves, kind of. But if the wolves ever outnumber the sheep, they’ll eat them. But if there are no sheep, then there’s nothing for the wolves to eat, so that’s okay. So now we have a new representation of the problem, one that will make the problem much easier to solve. The rules are more obvious, and it’s easier to evaluate whether or not they’re being met. Finally, we can make this visualization even a little bit more useful, by actually showing the movements between different states. That way we can see that for any state in the problem, there’s a finite number of next legal states. This would also allow us to notice when we’ve accidentally revisited an earlier state, so we can avoid going around in circles. So for example, from this state, we might choose to move the wolf and the sheep back to the left, but we’ll immediately notice that would make the state the same as this one. And it’s not useful to backtrack and revisit an earlier state. So we know not to do that. So these representations have made it much easier to solve this problem, than just the verbal representation we started with.
What are the characteristics of a good representation? First, good representations make relationships explicit. Laying things out like this makes it easy to tell that there are more sheep than wolves. Second, good representations bring objects and relationships together. Representing these as wolves and sheep, makes that relationship that the sheep must out number the wolves much more salient than using squares and circles. That brings the objects together with the relationships between them. Third, a good representation excludes extraneous details. For example, sometimes this problem is described in the form of having a river and a boat. But those details aren’t actually relevant to solving the problem at all. So, we’ve left them out of here. All we need to know is they need to move from the left to the right. Doesn’t matter if it’s a river, doesn’t matter if it’s a boat, this is all the information that we need. So we left out the extraneous information. Fourth, good representations expose natural constraints. We describe these as sheep and wolves because it makes it easier to think about the rule that wolves may never out number sheep. Now of course, this isn’t the best rule because we know that sheep can’t actually defend themselves against wolves. Three sheep and one wolf, the wolf would still win. However, if we visualize these as guards and prisoners instead, it involves holding and working memory the idea that prisoners inexplicably won’t flee if they’re left without any guards. So personally, I think the wolves and sheep metaphor is better. But perhaps the original name of the problem is even better. This was originally described as the cannibals and missionaries problem. It makes more sense that a missionary could defend themselves against a cannibal than a sheep could defend themselves against a wolf. But the cannibals and missionaries problem makes it a little bit dark. So let’s stick with sheep and wolves.
So let’s take an example of redesigning a representation to create a better mapping with a task. Here we have my circuit breaker. On the left we have a list of breakers, on the right we have what they actually control. To reset a breaker I need to go down the list on the left, find the one I want, count down on the right to find the right breaker, and switch it. How can we make this representation of what each breaker corresponds to better?
There are a number of things we can do here. The simplest change we could make would simply be to make the breakers themselves writable. Instead of writing a list on the left that we have to then map up to the breakers themselves on the right, we could just write on each breaker what it controls. That way we just have to look at the breakers themselves to find the breaker that we’re interested in. But then they still have to manually scan through all of them. We could further augment this by having a floor plan over here that actually gives the numbers on the floor plan for the breaker we want. So all I have to do is jump straight to the room that I’m interested in, find the number, go over the the list of breakers, and the label written on it would then confirm that I chose the right one. Now, if we wanted to get really crazy we could actually lay out the breakers themselves to correspond to the floor plan. We can have a floor plan and actually put the breakers on the floors that they control. Or we could even just put the breakers in the rooms themselves. So if the power goes out to a certain room, I just go find the breaker in that room. But there we’re starting to run into some of the other constraints on the problems. So it’s probably best to stick to what we can control, without requiring that the physical device be manufactured differently in the first place.
Representations are all around us in the real world, but they play a huge role in interfaces. Designing representations of the current state of a system is actually one of the most common tasks you might perform as an interface designer. So let’s take a look at a few, here’s Google Calendar which is a representation of my week. Notice how it actually uses space to represent blocks of time. It allows me to quickly feel how long different things are going to take. An alternate visualization might show an entire month instead of a week, but it would lose those indicators that linked the individual appointments. So it doesn’t really represent the structure and pace of my day, the way the weekly calendar does. This representation also allows me to very easily find conflicts in my schedule. So I know when I might need to reschedule something. On Friday I can see that I have a conflict for one of my meetings. And this interface also makes it easy to reschedule. I can pull up the calendar for the other person I’m meeting with and identify places where we both have free in our schedule. Another example of this is the PowerPoint animation pang. The numbers here represent when different animations happen concurrently. The middle icon represents what triggers the animation, and the right icon indicates the general nature of the animation. Whether it’s a movement, a highlight or an appearance. The PC version of PowerPoint makes this even better by actually showing you a timeline of the different animations to the right. That lets you very easily visualize when two different things are going to happen at the same time. Or when something waits for something else to happen. These are just two of the many many representations you use whenever you use a computer. Scroll bars for example, are representations of your relative position in a document. Highlighting markers like that rectangle are representations of what you currently have selected. All these representations work together to help your mental model match the real state of the system. Representations when used correctly can make many tasks trivial, or even invisible. And we as interface designers have a lot of control over representations in our designs.
Analogies and metaphors are powerful tools for helping users understand your interface. If you can ground your interface in something they already know, you can get a solid foundation in teaching them how to use your interface. For example, The Wall Street Journal’s website heavily leverages an analogy to The Wall Street Journal print edition. The headlines, the grids, the text all appear pretty similarly, so someone familiar with the print edition could pretty easily understand the online edition. If you’ve ever tried to explain a tool that you use to someone who’s never seen it, you’ve probably encountered something like this. For example, both at Udacity and in the Georgia Tech OMSCS program, we use Slack for communicating with each other. If you’ve never actually seen Slack, it’s a chat app for organizations to talk in different public and private rooms but listen to what I just said. It’s a chat app. In my description, I leveraged the analogy to something you’ve already seen. Now Slack is a pretty easy example because it is a chat app. It’s barely even an analogy to say it’s like a chat app because it is a chat app. It’s a very full featured chat app with a lot of integrations and things like that but it’s fundamentally a chat app. What about something harder? How about Medium? Medium is a writing platform that’s kind of like a blogging service but also kind of like a publishing service but also kind of like a news feed. You write articles kind of like WordPress blog posts but you can publish them through organizations which is kind of like traditional newspapers or news aggregators like the Huffington Post. My articles for example are published through Udacity. So, it’s not just like a blog because it’s not just my personal blog. There is a publisher element to it but the actual content is very similar to a blog-like platform. Articles are then published to interested people more like a news feed. So if I scroll down, I’ll see the articles that Medium thinks I would be interested in reading and in that way it’s more like Facebook or Twitter. So, notice that my entire explanation of Medium was based on analogies to other services like WordPress, Huffington Post, and Facebook. But analogies and metaphors have a downside. When you choose to use them, users don’t know where the analogy ends. When I describe Medium’s news feed is kind of like Facebook or kind of like Twitter, users might wonder where the retweet or share options are. Or when I describe it like a blog platform like WordPress, people might wonder where the comments are and it doesn’t really supply comments in the way that we’re used to. So, while analogies are powerful ways to help users understand our interfaces, we also need to pay special attention to what misconceptions they might introduce.
One of the challenges encountered by every new technology, is helping the user understand how to use it. Smart phone maybe pretty ubiquitous by now, but we’re still figuring out some elements of how to best use these things. Typing efficiency on a touch screen for example still hasn’t caught up to efficiency with a full keyboard. But that’s because typing on a phone was also a pretty straightforward transition from a regular keyboard, because the on-screen keyboard was designed just as an analogy to the real one. There are probably more efficient ways to enter text into a phone but it wouldn’t be as easily learnable as this straightforward analogy to a physical keyboard. This illustrates both the positive and negative sides of using analogies in designs. Analogies make the interface more learnable but they also may restrict the interface to outdated requirements or constraints. So, take a moment and think about how this applies to your chosen area of HCI. If you’re looking at things like gestural interfaces or touch-based interfaces, what analogies can you draw to other interfaces to make your designs more learnable. And at the same time, what do you risk by using those analogies?
In our lesson on design principles, we touch on a number of principles that are relevant to these ideas of mental models, representations, and metaphors. First, the idea that people reason by analogy to pass interfaces, or by metaphors to the real world, is one of the reasons that the principle of consistency is so important. We want to be consistent with the analogies and metaphors that people use to make sense of our interfaces. Second, when we say that an interface should teach the user how the system works, we’re echoing the idea of affordances. The way the system looks, should tell the user how it’s used. Just by observing the system the, user should be learning how to interact with it. Third, representations are important because they map the interphase, to the task at hand. A good representation is one that users can use predict the outcomes of certain actions. In other words, a good representation let’s users predict the mapping between their actions in the interphase, and the outcomes out in the world.
In designing interfaces, we want to leverage analogies to the real world, and principles from past interfaces whenever possible, to help the user learn the new interface as quickly as they can. But there’s a challenge here. Why are we designing technology if we’re not providing users anything new? It’s one thing to take the technology they’re already using, and make it more usable. But generally, we also want to enable people to do things they’ve never done before. That means there are no analogies, no expectations, no prior experiences for them leverage. How do you tell someone that’s used to control their own thermostat that they don’t need to anymore. So, while we need to leverage analogy and prior experience wherever possible. We also need to be aware that eventually, we’re going to do something interesting, and they’re going to break down. Eventually, we’re going to have to teach the user to use the unique elements of our interface.
Every interface requires the user to do some learning to understand how to use it. Very often, we visualize this as a learning curve. A learning curve plots expertise against experience. Generally, as the user gains more experience, they also gained more expertise. Here, our user starts with no experience at all, and so they also have no expertise at all. Our goal is for them to end with an expertise above this line of proficiency. However, the shape and steepness of this curve can vary. Ideally, we want a learning curve that grows quickly with relatively little experience. This is actually what we call a steep learning curve, although usually when you hear steep learning curve, it means the exact opposite. Technically, steep is good because steep means we’re increasing very quickly with relatively little experience. People often use steep to mean the opposite, and that’s because steep calls to mind connotations of a high difficulty level, like climbing a steep mountain. So, steep is actually a poor representation of this concept. So, instead, lets for us call this a rapid learning curve, which means that expertise grows very quickly with relatively little experience. Rapid calls to mind probably the proper connotations that a rapid learning curve is rapid learning, which is probably something we want. Interfaces that are more difficult to use would have slower learning curves. Here, the user needs a lot more experience to reach the same level of proficiency. So, how do we help our user reach proficiency faster? For one, if we’re consistent with existing conventions and use analogies that users understand, we can actually start them off with effectively some initial expertise. For example, when you download a new smartphone app, you know that the three horizontal lines that often appear in the top right likely indicate a menu. That’s a consistent convention used across multiple apps. So, using it means that when users open your app, they already have some expertise. From there, we want to make the ascension as rapid as possible. One way we can do that is by using representations and affordances that help the user immediately understand how to use the interface. So, good design is in part about helping users achieve proficiency as quickly as possible, either through starting them off with some initial expertise or helping them grow in their expertise with as little experience as possible.
As we design interfaces, we will no doubt encounter instances where the user makes mistakes. Sometimes this might be because our users are stressed or distracted, but other times it might be because our users fundamentally don’t understand our interfaces or even don’t understand their own goals. As designers though, we know there’s really no such thing as user error. Any user error is a failure of the interface to properly guide the user to the right action. In designing interfaces, there are two kinds of user error that we’re interested in avoiding. The first are called slips. Slips occur when the user has the right mental model, but does the wrong thing anyway. Take this box for example, prompting a user closing a program on whether they’d like to save their work. In all likelihood, the user knows exactly what they want to do, and typically, it’s going to be to actually save their work. If you ask them to explain what they should do, they would say, “Click Yes.” But imagine if the order of these buttons was flipped so that the No was on the left and Yes was on the right. A user might click on the left just because they used to seeing Yes on the left, even though they know they really want to click Yes. Or imagine that No is selected by default. So that if a user just presses enter when this dialog comes up, it automatically says No. In that case also, they knew that they wanted to save their work, but what they did didn’t match the goal they wanted to accomplish. A mistake on the other hand, happens when the user has the wrong mental model and does the wrong thing as a result. Take this prompt for example. The users asked if they want to revert to the original file. That’s really just a backwards way of asking whether they want to save, but this is foreign terminology to many users. Their mental model of what saving is doesn’t tell him necessarily what to do in this instance. What’s more, they don’t actually have a choice here. Without a cancel button, they’re forced to choose knowing one option could mean losing their changes. Here the problem is a mismatch between their internal model and the way the system is working, or at least the way it describes itself. So, slip occurs when the user knows the right thing to do but does the wrong thing anyway, but a mistake occurs when the user doesn’t even know the right thing to do.
Don Norman further divides slips into two different categories. He describes action-based slips and memory lapse slips. Action-based slips are places where the user performs the wrong action, or performs a right action on the wrong object, even though they knew the correct action. They might click the wrong button, or right-click when they should left-click. A memory lapse slip occurs when the user forgets something they knew to do. For example, they might forget to start a timer on a microwave. They knew what to do, they just forgot about it. So action-based slips are doing the wrong thing, and memory lapse slips are forgetting to do the right thing. In this dialog, clicking No when you mean to click Yes would be an example of an action-based slip. The very existence of this dialog is meant to prevent a memory lapse slip, where a user would forget to save their work before closing.
Norman also divides mistakes in the multiple categories, in this case, three categories. Rule based mistakes, knowledge base mistakes and memory lapse mistakes. Rule based mistakes occur where the user correctly assesses the state of the world but makes the wrong decision based on it. Knowledge based mistakes occur where the user incorrectly assesses the state of the world in the first place. Memory lapse mistakes are similar to memory lapse slips, but this focuses on forgetting to fully execute a plan not just forgetting to do something in the first place. If the user clicks the wrong button in this dialog, it could be do to multiple different kinds of mistakes. Maybe they correctly knew they wanted to save their changes but they didn’t realize that clicking no is actually what would save, that would be a rule-based mistake. They knew they wanted to save, but they made the wrong decision based on that knowledge. Or perhaps they didn’t even realize they wanted to save in the first place. Maybe they didn’t think they made any changes, when in actuality they did. That would be a knowledge based mistake. They applied the right rule based on their knowledge but their knowledge was inaccurate. If they were to shut down their computer and never come back and answer this dialogue in the first place, that might be considered a memory lapse mistake. They didn’t fully execute the plan of closing down the application. So in our designs, we want to do everything we can to prevent all these different kinds of errors. We want to help prevent routine errors by leveraging consistent practices like designing dialogues the way users are used to. We also want to let our interface off load some of the demands on working memory from the user to the computer to avoid memory lapse errors. And we want the leverage good representations to help users develop the right mental of models to minimize these rule-based and knowledge-based errors. And while errors are inevitable, we should make sure to leverage the tolerance principle to make sure the repercussions can never be too bad.
When you’re looking to improve an interface, user errors are powerful places to start. They’re indicative either of weaknesses in the user’s mental model or places where the system isn’t capturing the user’s correct mental model. So let’s try to address an error Morgan’s encountering. Morgan usually texts with her boyfriend but she texts with some other people too. But she finds she’s often sending the wrong messages to the wrong people. The app by default brings up the last open conversation and usually that’s her boyfriend. But sometimes it’s someone else and she accidentally messages them instead. First, is this a slip or is this a mistake?
I would argue this is a slip. Morgan knows who she means to message but the phone’s behavior tricks her into sending things to the wrong people. What’s more, this might be either an action based slip or memory lapse slip. Maybe Morgan is tapping the wrong person, or maybe she’s forgetting to check who she’s messaging. So take a second and brainstorm a design for this that can prevent this from happening in the future without over complicating the interaction too much. I would argue that the best way to do this is simply to show more pervasive reminders of who Morgan is currently texting. We could show the recipient’s picture on the send button, for example. That way, the interaction is no more complex, but Morgan also has to directly acknowledge who she’s messaging to send a message.
The feedback cycle in HCI is reliant on a relationship between the users input and the interface’s output. The idea of this cycle is that the user learns from the output what they should have input. If they do something that causes an error, they receive feedback on how to avoid that next time. If they do something correctly, then they see that their goal was accomplished. That’s the principle of feedback. But what happens when there’s no discernible interaction between the input and the output? What happens when there’s a break in this cycle? What happens when the user acts in the system over and over and over again, but never receives any output that actually helps? What if they never even receive output that indicates that the computer is understanding them or receiving input from them? That’s when something called learned helplessness sets in. The human working with the interface learns that they’re helpless to actually use the system. They learn that there is no mapping between their input and the output they receive. As a result, they believe that there’s just nothing they can do to accomplish their goals. No one wants to feel that way. No one wants to feel like no matter what they do, they are doomed to failure. No one wants to feel like they’re failing at something that everyone else seems to do very easily. So, it’s very natural for people to develop this resistance to even try and to learn about this interface.
Just like mental models, learned helplessness is also a topic related as much to education as it is to HCI. If you’ve ever spent any time in a teaching role, you very likely encountered students that are very resistant to being taught. And the reason is they have learned that no matter what they do, they never succeed. They’ve learned to be helpless based on their past experiences. In all likelihood, there have actually been situations where you’ve been the one learning that you’re helpless. In fact, if you’re a parent, I can almost guarantee you’ve been in that situation. There are times when your child was crying and inconsolable and you had no clue why. We had one of those right before we filmed this video. Nothing you did helped. And you learned that you were helpless to figure out what your child wanted. So if you’re a parent and you’re dealing with learned helplessness as an interface designer, just imagine that you are the user and the interface is your screaming child. What feedback would you need from your child to figure out how you can help them? And how can you build that kind of feedback into your interface? » [SOUND]
Generally, when we’re developing interfaces, we’re going to be experts in those domains. It’s rare that you design an interface to help people do something that you yourself don’t know how to do. But as a result, there’s risk for something called expert blind spot. When you’re an expert in something, there are parts of the task that you do subconsciously without even really thinking about them. For example, a professional basketball player knows exactly where to place their hands on the ball when taking a shot. I know exactly what to do when I walk in the studio. Amanda knows exactly what to do when she gets behind the camera. And yet, if we were suddenly asked to train someone else, there are lots of things we’d forget to say or lots of things we would assume would just be obvious. That’s exactly what you’re doing when you’re designing an interface. You’re teaching the user how to use what you’ve designed. You’re teaching them without the benefit of actually talking to them, explaining things to them, or demonstrating things for them. You’re teaching them through the design of the interface. So, you have to make sure that you don’t assume that they’re an expert too. You have to overcome that expert blind spot because we are not our users. We are not the user. That can be the motto of all of HCI. I am not my user. Say it with me, I am not my user. One more time, I am not my user. Now type it.
Now, write it on a Post-it note, and stick it to your monitor. If you wear glasses, write it on the inside of the lens. Record yourself saying it on your phone, and set that as your ringtone. Do whatever you have to do to remember. I am not my user.
In order for us to really sympathize with users suffering from the effects of learned helplessness and expert blind spot, it’s important for us to understand what it’s like to be in that position. We’ve all experienced these things at some point in life, although at the time, we might not have understood what was happening. So take a second and reflect on a time when you experienced learned helplessness and the effects of expert blind spot from someone trying to teach you something. It might have been in a class, it might be learning a new skill, or it might be doing something that everyone else seems to do just fine day to day, but for whatever reason, you’ve always struggled with.
The fact that I’m filming this in the kitchen probably tells you where I experience this. Anything related to cooking, I feel completely helpless. I’ve given myself food poisoning with undercooked meat lots of times, I once forgot to put the cheese on a grilled cheese sandwich. I accidentally made toast, and it wasn’t even good toast. And I’ve always heard, it’s just so easy, just follow the recipe, but no, it’s not that easy, because many recipes are written for experts. So for example, here’s a recipe from my wife’s cookbook. It calls for a medium saucepan. Is this a medium saucepan? I have no idea. Calls for one egg beaten with a splash of water. A splash, like a splash when you over fill a water bottle or a splash when your sibling soaks you at the pool? Pulse to combine. Cook until the edges are golden brown. What’s golden brown? Give me a color code and I’ll compare it, but otherwise, I don’t know where on the spectrum from golden to brown, golden brown lies. These are examples of places where the directions are given in a way that assumes I already have some expertise, that I really don’t have.
In this lesson, we talked about mental models. We discussed what mental models are and how the user uses them to make sense of a system. We discussed how good representations can help users achieve strong models. We didn’t discuss how issues with interfaces can lead to two different kinds of user error slips and mistakes. We then discuss learned helplessness, which can come from giving poor feedback on user errors, and finally, we discussed expert blind spot and the importance of understanding that you are not your own user.
When looking at human computer interaction, we’re really looking at the tasks that users perform. We look at the tasks that they’re performing now, and we try to restructure those tasks to be more efficient using new interfaces. In all of this, the task is at the heart of the exercise. What task are they performing? So today, we’re going to talk about two methods for formerly articulating the tasks that people are completing. First, we’ll discuss human information processor models. Especially, the Goleman’s model, which focuses on the input to the user, and the output from the user. Now, this is similar to the process or model of the user that we discussed elsewhere. Second, we’ll discuss cognitive task analysis, a way of trying to get inside the user’s head, instead of focusing just on the input and the output. Note that that’s similar to the predictor model of the user that we also discussed elsewhere.
The GOMS model is a human information processor model so it builds off the processor model of the human’s role in a system. The GOMS model gets it’s name from the four sets of information it proposes gathering about a task. G, stands for the users Goals in the system. O, stands for the Operators the user can perform in the system. M stands for the Methods that the user can use to achieve those goals. And S stands for the Selection rules that the user uses to choose among different competing methods. So the GOMS Model proposes that every human interact with the system has a set of Goals that they want to accomplish. They have sent methods that they can choose from to accomplish those goals. Each of those methods is comprise of a series of Operators that carries out that method. And they have some Selection rules that help them decide what method to use and when. The GOMS model is often visualized like this. The user starts with some initial situation, and they have a goal in mind that they want to accomplish, so they apply their selection of rule to choice between different competing methods to accomplish that goal. Once they’ve chosen a method, they execute that series of operators and makes that goal a reality
We can take the GOMS model and apply it to a number of different domains. So let’s take the example of needing to communicate a message to a coworker. We have an initial situation, which is the need to transfer information to a coworker. That carries with it the implicit goal of the information having been transferred. We might have a number of different methods in mind for how we could do that. We could email them, we could walk over and talk to them in person. And we also have some selection rules that dictate how we choose amongst these methods. If what we need to transfer is very time-sensitive, maybe we walk over and talk to them in person or call them on the phone. If the information we need to transfer is complex and detailed, maybe we write them an email. Or if it’s more casual, maybe we chat with them or text them. No matter what method we choose, we then execute the series of operators that carries out that method, and the result is our goal is accomplished, the information has been transmitted. Or we could also take the problem of navigation. Our initial situation is the need to get to our destination, which carries with it the implicit goal of having reached our destination. We might have different methods, like take the scenic route, take the highway route, take the surface streets, and some selection rules that might say something like, when it’s rush hour on the highway, take surface streets, or if it’s not time sensitive, take the scenic route. After choosing, we execute those operators and reach our goal. So in this way, GOMS models capture our goals, our different methods for carrying out those goals, and the individual operators that we use to execute those methods.
Let’s try this out. We’re going to watch Morgan enter the house and undo her security system two different ways. After you watch the video, try to outline Morgan’s goals, outcomes, methods, and selection rules. [MUSIC] Now try to outline the goals, outcomes, methods and selection rules for these two methods of disabling the security system.
Here’s one example of how you might design a GOMS model for disabling a security system. Our initial situation is that we’re entering the home with the alarm set and we have two methods for disabling the alarm. We can use the keypad or we can use the keychain. Either way, our goal is that we’ve entered the home and reenabled the alarm. Our selection rules might be something like if we have our hands full, we’re going to use the keypad so that we can get inside and put the stuff down. But if we don’t have our hands full, we’ll use the keychain. You might come up with other models for this that have either different methods, different operators, different selection rules. There are a lot of different ways we can capture a task with the GOMS model, depending on what you choose to focus on.
There are strengths and weaknesses to the GOMS representation for tasks. One weakness is that it doesn’t automatically address a lot of the complexity of these problems. For example, there are likely many different methods and submethods for addressing this goal. Before even getting this selection rules among what route to take, you might decide whether to take public transportation or whether to work from home that day. In parallel to that, even after deciding to drive, you might decide what car to take if your family has more than one car. The standard GOMS model leaves those kind of things out, although there are augmented versions that have been created to deal with this kind of complexity like CMN GOMS or in GOMS L. We’ll talk about those a bit more later. A second weakness is that the GOMS model assumes the user already has these methods in mind, that means the user is already an expert in the area. GOMS models don’t do a good job of accounting for novices or accounting for user errors. For example, if you are driving in an unfamiliar location, you don’t even know what the methods are, let alone how to choose among them. The strength of GOMS models on the other hand is their ability to formalize user interaction into steps that we can use to actually make predictions. We can measure how long each these operators takes and so we can predict the overall efficiency of using a certain interface. For example, in this GOMS model if we had included the operator to pull keys out of the users pocket, we might quickly identify that the relative efficiency of these two methods is very much dependent on how long that step takes. The Key Chain method might be a lot faster if the user can get their key chain out pretty quickly. But for other users, the fact that they need to pull something out of their pocket while holding bags or holding a baby, makes a keypad a more efficient option. By performing that kind of reasoning, we can focus on areas that either method and the interface as a whole can be improved.
There are several varieties of GOMS models. These varieties share the commonality of goals, operators, methods, and selection criteria, but they differ in what additional elements they provide. Bonnie John and David Kieras cover four popular variations in a paper from 1996. The first is the Vanilla GOMS we’ve talked about so far, and the other three are KLM-GOMS, CMN-GOMS and NGOMSL. Let’s talk about what those acronyms actually mean. They start with the Keystroke-Level Model, which is the simplest technique. Here, the designer simply specifies the operators and execution times for an action and sums them to find the complexity of an interaction. This method proposed six different types of operators, although for moderate interfaces, we would need some new ones to cover touchscreens and other novel interfaces. A second variation is CMN-GOMS. CMN-GOMS is an extension of GOMS that features sub-methods and conditions in a strict goal hierarchy. For example, here we see a hierarchy of goals, as well as the ability to choose between multiple goals in different areas. Notice also the level of granularity behind these GOMS models. The goals go all the way down to little goals like moving text or deleting phrases. These are very, very low-level goals. Notice also the way this model is being used. The authors are using it to find the places where there’s a lot of complexity that can be cut out. They do this by modelling how long each individual action takes, as well as looking at the number of interactions required and seeing if it can be cut down a bit. A third variation is called Natural GOMS Language. Natural GOMS Language or NGOMSL is a natural language form of GOMS that lends itself to human interpretation. In all these cases, the important point of emphasis is the way that these models allow us to focus in on places where we might be asking too much of the user. For example, in this model, the user was being asked to carry a lot of information in working memory. By making the assumptions and actions and operators this detailed, this model acts as target where working memory is being overly taxed in a way that we might miss when we’re doing higher level designs.
Here are five quick tips for developing GOMS models. Number one, focus on small goals. We’ve used some pretty big examples, but GOMS is really designed to work in the context of very small goals like navigating to the end of a document. You can abstract up from there, but start by identifying smaller moment-to-moment goals. Number two, nest goals, instead of operators. It’s possible to nest goals. For example, in our GOMS model of navigation, we could develop it further and break the overall task of navigating down to smaller goals like changing lanes or plotting routes. Operators, however, are the smallest atoms of a GOMS model. They don’t break down any further, and those must be the actual actions that are performed. Number three, differentiate descriptive and prescriptive. Make sure to identify whether you’re building a model of what people do or what you want them to do. You can build a GOMS model of what people should do with your interface, but you shouldn’t trick yourself into thinking that’s necessarily what they will do. Number four, assign costs to operators. GOMS was designed to let us make predictions about how long certain methods will take. The only way we can do that is if we have some measurement of how long individual operations take. Usually, this is time, but depending on the domain, we might be interested in phrasing the cost differently as well. Number five, use GOMS to trim waste. One of the benefits of GOMS is it lets you visualize where an unnecessary number of operators are required to accomplish some task. That’s bolstered by the costs we assign to this operators. So, use GOMS to identify places where the number of operators required can be simplified by the interface.
Gomes models are human information processor models. This method largely assumes the human is an input output machine, and it doesn’t get too much into the internal reasoning of the human. Instead, it distills their reasoning into things that can be described explicitly like goals and methods. Some would argue, myself included, that human reasoning is actually too nuanced and complex to be so simplified. They, or we, advocate other models to get more into what goes on inside the user’s head. That’s where cognitive task analysis comes in. Cognitive task analysis is another way of examining tasks, but it puts a much higher emphasis on things like memory, attention, and cognitive load. Thus, cognitive task analysis adopts more of the predictor view of the human’s role in the system.
This conflict between more processor-oriented and more predictor-oriented models of the user actually gets at the core of an old battle in psychology between behaviorism and cognitivism. Behaviorism emphasized things that could be observed. We can see what input a person is receiving. We can see the output they’re producing. And that might be all we need to understand the design of things. Cognitivism, on the other hand, suggests we can and should get into the mind of what people are actually thinking and how systems like memory and learning and perception actually work. So take a moment and reflect on what you think about this. When designing interfaces, how much attention should you devote to observable goals, operators and methods? And how much do you devote to understanding internal thought processes, like cognition, learning, and memory?
You can probably guess my bias on this issue, given that I’ve already badmouthed the processor model and I also teach cognitive systems. So I’m generally going to prefer methods that focus on cognition. I think it’s important to note here though that both approaches have significant value. The GOMS model and its emphasis on identifying goals and operators is actually very useful in HCI. Because it forces us to very clearly and deliberately identify user goals and the sequence of actions that accomplish them. We can get so caught up in user experiences that we forget the user experience is born out of individual operators. So while I wouldn’t advocate focusing solely on the user as some kind of input output information processor, there’s value in defining the user’s operation as clearly and specifically as we define a computer’s.
Cognitive task analysis is not really a single method, but it’s more of a type of method for approaching the evaluation of how people complete tasks. Performing a cognitive task analysis involves a number of different techniques and methods that we’ll discuss more when we discuss the design lifecycle. For right now though, we’re interested in what kinds of information we’re trying to gather, not how we’re gathering it. Cognitive task analyses are especially concerned with understanding the underlying thought processing performing a task, not just what we can see but specifically what we can’t see. There are a lot of different methods for performing cognitive task analyses but most methods follow a particular common sequence. First, we want to collect some preliminary knowledge. While we as interface designers don’t need to become experts in a field, we need a good bit of familiarity with it. So, we might observe people performing the task for example. In navigation, we might just watch someone driving and using a GPS. Our second step is to identify knowledge representations. In other words, what kinds of things does a user need to know to complete their task? Note that we’re not yet concerned with the actual knowledge they have, only the types or structures of the knowledge that they have. For example, we want to know, does this task involves a series of steps to do in a certain order? Does it involve a collection of tasks to check off in any order? Does it involve a web of knowledge to memorize? For navigation, for example, we would identify that the structure of the knowledge is a sequence of actions and order as well as knowledge of things to monitor as we go. In the third stage, we actually want to populate those knowledge representations. This is the stage where we start to recognize what the user actually knows. With navigation for example, they know to start the GPS, to enter an address and to obey the turns while monitoring traffic and speed and things like that. During this stage, we identify all the specific actions they take, the knowledge they must have in mind to take those actions, the interruptions that can change their thought processes, the equipment involved and the sensory experience of the user. We do this by applying focused knowledge elicitation methods. In other words, we get users to tell us what’s going on in their heads or what’s going on in their environment or sometimes we do things that help us understand parts of the task that the user isn’t even themselves aware of. Then we analyze and verify the data we acquired. Part of that is just confirming with the people we observe that our understanding is correct. We might watch them do something and infer it for one reason when in reality it’s for a very different reason. So, we want to present to our users our results and make sure that they agree with our understanding of their task. Then we attempt to formalize it into structures that can be compared and summarized across multiple data gathering methods. Finally, we format our results for the intended application. We need to take those results and format them in a way that’s useful for interface design. We want to develop models that show what the user was thinking, feeling and remembering at any given time and make those relationships really explicit. The result might look something like this. Here we see a very high level model of the process of driving to a destination. What’s interesting to note is that these tasks in the middle are highly cognitive rather than observable. If I had no knowledge about driving and I sat in the passenger seat watching the driver, I might never know that they’re monitoring their route progress or keeping an eye on their dashboard for how much fuel they have left. If you have kids, you may have experienced this personally actually. Two kids sitting in the backseat, mommy or daddy are just sitting in the driver seat just like they’re sitting in the passenger seat. They don’t have a full understanding of the fact that you have a much higher cognitive load and you’re doing a lot more things while you’re driving then they are. That’s because what you’re doing is not observable. It’s all in your head. So, to get at these things, I might have the user to think out loud about what they’re doing, while they’re doing it. I might have them tell me what they’re thinking while they’re driving the car. That would give me some insights into these cognitive elements of the task.
Cognitive task analysis advocates building models of human reasoning and decision-making in complex tasks. However, a challenge presented here is that very often, large tasks are actually composed of many multiple smaller tasks. We can see this plainly present in our cognitive model of driving. These tasks are so high level, that it’s almost useful to describe driving in these terms. Each part can be broken down into various sub-tasks, like iteratively checking all the cars around you, or periodically checking how long it is until next turn needs to be made. What’s more, these smaller tasks could then be used in different contexts. Route monitoring, for example, isn’t only useful when driving a car, it might be useful while running or biking or while riding as a passenger. Traffic monitoring might be something that autonomous vehicles might do, not just the human user. So, the analysis of a task in a particular context could be useful in designing interfaces for other contexts if we break the analysis down into sub-tasks. So, let’s take a simple example of this. Here is a somewhat simple model of the act of buying something online. Notice that a lot of the tasks involved here are general to anyone shopping on any website and yet, every website needs to provide all of these functions. As a side note, notice also the interesting analogy going on with the top two. Online, there is no cart or checkout station, but we borrowed those to help the user understand the shopping process online and how similar it is to shopping in a store. Now, anyway, if we treat this cognitive task analysis more hierarchically, we can start to see a well-defined sub-task around this checkout process. Every online vendor I’ve ever encountered has these steps in its checkout process. Now, because this is so well-defined, we can actually leverage existing tools, like existing payment widgets or something like PayPal. This hierarchical task analysis helps us understand what tools might already be available to accomplish certain portions of our task, or how we might design certain things to transfer between different tasks and different contexts. Hierarchical task analysis also lets the designers of the site abstract over this part of the process and focus more on what might make their particular site unique. This kind of task analysis is so common that you generally will find tasks and sub-tasks whenever you’re looking at the results of a cognitive task analysis. So, it’s important to remember the strengths supplied by this hierarchy, abstracting out unnecessary details for a certain level of abstraction, modularizing designs or principles, so they can be transferred between different tasks or different contexts, and organizing the cognitive task analysis in a way that makes it easier to understand and reason over. Last, it’s really important to note that the cognitive and hierarchical task analyses that we’ve shown here are extremely simplistic, mostly, honestly because of the limited screen real estate. When you’re creating a real cognitive models, you’ll likely have several levels of abstraction, several different states, and additional annotating information like what the user has to keep in mind, or how they might be feeling at a certain stage in the analysis. We’ll put some examples of some good thorough models in the notes.
Let’s watch the videos of Morgan disabling her security system again. This time though, let’s try to approach this from a more cognitive task analysis perspective. We won’t be able to do that fully, because doing a full cognitive task analysis means interviewing, asking the user to think out loud, and more. But we can at least try out this approach. Remember, in doing a cognitive task analysis for a task like this, your goal is to build a model of the sequence of thoughts going on inside the user’s head. Pay special attention to what she needs to remember at each step of the process. [MUSIC]
What we saw here was that to get inside and disable the alarm, there was a sequence of actions that had to be completed, but some of them could be completed in different orders. If she used the keypad, she had to first unlock the door and then open the door. Then she could either disable the alarm on the keypad or close the door. And after closing the door, she could re-lock the door, though she could also do that before disarming the alarm. So there’s some choices there. With the keychain, the sequence of tasks related to the door remain the same, but she had the option of disarming the alarm before even entering. However, that required remembering to do so. When using the keypad, she didn’t have to remember because the alarm beeps at her until she turns it off. But she has to remember the key code. Performing these cognitive task analyses gives us the information necessary to evaluate different approaches and look for areas of improvement. For example, if she can disable the alarm just by pressing the keychain button, why does she need to press it at all? Why doesn’t it just detect that she’s coming in with a keychain in her pocket?
Just like Goms models, cognitive task analysis also have some strength and some weaknesses. One strength is that they emphasize mental processes. Unlike the Goms model, cognitive task analysis puts an emphasis on what goes on inside the users head. It’s thus much better equipped to understand how experts think and work. The information it generated is also formal enough to be used for interface design, for comparison in mode alternatives and more. There are disadvantages though. One, cogni-task analysis are incredibly time-consuming to perform. They involve talking to multiple experts for extended period of time, then systematically analyzing the data. A second weakness is that cogni-task analysis risk deemphasizing context. In zooming in on the individual’s own thought processes, cogni-task analysis risks deemphasizing details that are out in the world. Like the role of physical capabilities or interactions amongst different people, or different artifacts. And third, like Goms models, cogni-task analysis also isn’t well suited to novices. It’s well suited to expert users who have very strong models of the way they work and clearly understand their own mental thought processes. But they’re not very well suited for novice users who are still trying to learn how to use an interface.
GOMS and cognitive tasks analysis are just two of the many alternatives to understanding how users approach tasks. More in line with the human information processor models, there exist models like KLM, TLM, and MHP, which capture even finer grain actions for estimating performance speed. There are other extensions to GOMS as well that add things like sub goals, or other ways of expressing content like CPM-GOMS and NGOMSL. CPM-GOMS focuses on parallel tasks, while NGOMSL provides a natural language interface for interacting with GOMS models. More on the lines of cognitive models, there exists other methods as well like CDM, TKS, CFM, Applied Cognitive Task Analyses, and Skill-Based Cognitive Task Analyses. CDM puts a focus on places where critical decisions occur. TKS focuses on the nature of humans’ knowledge. CFM focuses on complexity. ACTA and Skill-Based CTA are two ways of gathering the information necessary to create a cognitive model. There also exists other frameworks more common in other disciplines, for example, production systems are common to an artificial intelligence. But they’re intended to model cognitive systems the same way these cognitive models do. So we can apply production systems here as well and attempt to prescribe rules for users to follow.
Every possible application of HCI involves users completing some sort of task. That task might be something within a domain. In educational technology, for example, at that task might be learning how to do a certain kind of problem. If your area is more technological, the task might be something that the user is doing through your application like using virtual reality and gesture recognition to sculpt the virtual statue. Take a moment and try to think of the kinds of tasks you might be interested in exploring in your chosen application area. Did they lend themselves more to an information processor model like gums, or to cognitive models like hierarchical task analysis and how can you tell?
Today, we’ve talked at length about two general methods for approaching task analysis. One, the GOMS family of approaches tries to distill tasks down to their goals, operators, methods, and selection rules. The other, cognitive task analysis aimed to get into the head of the user and understand what they’re thinking, feeling, and remembering at every stage of the task. When we discussed design life-cycles, we focused a bit on how to fill these miles with information. But our focus there is on the methods for gathering the information rather than the structure of the information itself. So, it’s important to keep these methods in mind when we talk about that.
[MUSIC] In discussing a human-computer interaction, there’s often a tendency to look narrowly at the user interacting with the computer. Or slightly more broadly at the user interacting with the task through some computer. But many times we’re interested in zooming even further out. We’re interested, not only in the interaction between the human, the computer and the task, but also in the context in which that interaction takes place. So today we’re going to look at four different models or theories, of the context surrounding ACI. We’ll focus primarily on distributed cognition, which is one of the dominant theories on the interplay between multiple agents, artifacts, and contexts. We’ll also touch on three other significant theories, social cognition, situated action, and activity theory.
Cognition on its own is interested in thought processes and experiences and we naturally think of those as occurring inside the mind. But distributed cognition suggests that models of cognition should be extended outside the mind. This theory proposes expanding the unit we use to analyze intelligence from a single mind to a mind equipped with other minds and artifacts and the relationships among them. So, let’s take an example of this. Amanda, give me a hard addition problem. Okay, can I do that in my head? No, I also can’t even remember the numbers you just read to me. But I have a pen and paper here and using those, I can easily write down the numbers, so give those numbers to me again. Okay. Using that, I can now do the calculations by hand and the answer is 7,675. Now, did I get smarter when I grabbed the pen and paper? Not really, not by the usual definition of smarter at least. But the system comprised of myself, the pen, the paper, is a lot more than just my mind on its own. The cognition was distributed amongst these artifacts. Specifically, the paper took care of remembering the numbers for me and remembering and tracking my progress so far. So, instead of adding 1,238 plus 6,437, I was really just adding eight plus seven, three plus three, two plus four, and so on.
One of the seminal works in distributed cognition research is a paper from the Journal of Cognitive Science from You might recognize Edwin Hutchins’ name from our lesson on Direct Manipulation and Invisible Interfaces. He was one of the coauthors there as well along with Don Norman. This is one of my favorite papers in part simply because of the very subtle change in emphasis that we see in the title. We tend to think of remembering as a uniquely human or biological behavior. We describe computers as having memory, but we don’t usually describe computers as remembering things. Remembering is more of a human behavior, but the paper title twist that a little bit. It isn’t the human, it isn’t the pilot remembering, it’s the cockpit remembering. What is the cockpit? The cockpit is a collection of controls, sensors and interfaces as well as the pilots themselves. The paper title tells us that it’s this entire system, the pilots, the sensors, the controls in the interfaces among them that do the remembering. The system as a whole or the cockpit as a whole is remembering the speed, not just the human pilot or pilots in the cockpit. No individual part in isolation remembers what the system as a whole can remember.
In order to understand the application of distributed cognition to the cockpit, it’s important for us to first understand what challenge it’s addressing. The technical reasons for this are a bit complex, and I strongly encourage reading the full paper to get the full picture. But to understand the simplified description I’ll give here, here’s what you need to know. When a plane is descending for landing, there exists a number of different changes the pilots need to make to the wing configurations. These changes are made at different speeds during the descent. When the plane slows down to a certain speed, it demands a certain change to the wind configuration. The speeds at which these configuration changes must happen differ based on a number of different variables. So for every flight there’s a unique set of speeds that must be remembered. That’s why the title of this paper is, How a Cockpit Remembers Its Speeds, Speeds, plural. It isn’t just remembering how fast it’s going now, it’s remembering a sequence of speeds at which multiple changes must be made. The configuration changes to the wings must be made during the descent at narrowly defined times. That creates a high cognitive load. Pilots must act quickly. And mistakes could mean the deaths of themselves and hundred of others. So how do they do this? First, the pilots have pages that contain the speeds for their descent, based on different parameters. The cockpit itself has an entire booklet of pages like this. So we know that the cockpit has its pilots who are responsible for actually reasoning over things. But that booklet forms that cockpits long term memory of different speeds for different parameters. Then, prior to the descent, the pilots find the page from that booklet that corresponds to their current parameters. They pull it out and pin it up inside the cockpit. That way, the sheet is accessible to both pilots. And they’re able to check one another’s actions throughout. This becomes one form of the cockpits short term memory, a temporary representation of the current speeds. At this point, we have to attribute knowledge of those speeds to the cockpit itself. If we were to isolate either pilot, they would be unable to say what the speeds are from memory, but without the pilots to interpret those speeds, the card itself is meaningless. So it’s the system of the entire cockpit, including the pilots, the booklet and the current card that remembers the speeds. Then as the pilots begin the descent, they mark the different speeds right on the speedometer with these little speed bugs. The speed bugs tell them which speeds to remember in a way that can just be visually compared. When they see the speedometer pass a speed bug, they know it’s time to make a certain change. This is like the working memory for the cockpit. The short-term memory stores the numbers in a way that the pilots can reason over, but the speed bugs on the speedometer store them in a way that they can very quickly just visually compare. They don’t need to remember the numbers themselves, or do any math. All they have to do is visually compare the speed bugs to the current position of the speedometer. So what do we see from the system as a whole? Well, we see the long term memory in the book of cards. We see a short term memory in the card they selected. We see a working memory in the speed bugs on the speedometer. And we see decisions on when to make configuration changes distributed across the pilots and these artifacts. No single part of this cockpit, not the pilots, not the speed bugs, not the cards, could perform the action necessary to land a plane on their own. It’s only the system as a whole that does so. That’s the essence of distributive cognition. The cognition involved in landing this plane is distributed across the components of the system. This is a deeper notion than just using interfaces to help us do tasks. The important thing here is that these different interfaces serve cognitive roles in the system.
Distributed cognition is deeply related to the idea of cognitive load. Recall the cognitive load refers to your minds ability to only deal with a certain amount of information at a time. Distributed cognition suggests that artifacts add additional cognitive resources. That means the same cognitive load is distributed across a greater number of resources. Artifacts are like plugging extra memory into your brain. Driving is a good example of this. Sometimes while driving, you’re cognitive load can be very, very high. You have to keep track of the other cars around you. You have to keep track of your own speed to monitor your route planning. You have to make predictions about traffic patterns. You have to pay attention to your own level of gasoline, or in my case, electric charge. You might be attending to something in your car as well, like talking to your passenger, or keeping an eye on your child in the back seat. It can be a big challenge. A GPS is a way of off-loading one of the tasks, navigation, on to another system. And thus, your cognition, is now distributed between you In the GPS. Turn on cruise control and now it’s distributed across the car, as well. Your off loading the task of tracking your speed to the car. Every task you also de-artifacts, decreases your own personal cognitive load.
Let’s analyze a simple task from the perspective of distributed cognition. Here we see Morgan paying some bills the old fashioned way. For each bill she pulls it off the pile, reads it, writes a check and puts them together in a stack on the right. Where do we delineate this system? What are its parts?
We’re interested in any part of the system that performs some of the cognition for Morgan. While the chair, table, and light over head make this possible, they aren’t serving any cognitive roles. Morgan herself, of course, is, and two piles of bills are too. They are an external memory of what bills Morgan has already paid, and what she still needs to pay. This way she doesn’t have to mentally keep track of what bills she has left to do. The bills themselves remember a lot of the information for her as well like the amounts and the destinations they need to be sent to. What about the pen and checkbook? That’s when things start to get a little bit more tricky. The checkbook itself is part of the system because it takes care of the record keeping task for Morgan. Checkbooks create carbon copies, which means Morgan doesn’t have to think about tracking the checks manually. The pen is a means of communicating between these systems, which means it’s part of our distributed cognition system as well.
Something important to note is that distributed cognition isn’t really another design principle. Distributed cognition is more of a way of looking at interface design. It’s a way of approaching the problem that puts your attention squarely on how to extend the mind across artifacts. We can actually view many of our design principles as examples of distributed cognition. So this is my computer, and when I set this up, I wasn’t thinking about it in terms of distributed cognition. And yet we can use distributive cognition as a lens through which to view this design. For example, I always have my calendar open on the right. That’s a way of off loading having to keep track of my daily schedule in working memory. It bugs me if I have a teleconference to attend or somewhere I need to go. In fact I rely on this so much it gets me in trouble. It doesn’t keep track of where I need to be for a given meeting and if I fail to keep track of that in working memory I might end up at home when I need to be at Georgia Tech. We can even view trivial things like a clock as an example of distributed cognition that prevents me from having to keep track of the passage of time manually. The point is that distributed cognition is a way of looking at interfaces and interface design that focuses your attention on what systems as a whole can accomplish as opposed to individuals on their own.
Distributed cognition is a fun one to reflect on because we can take it to some pretty silly extremes. We can go so far as to say that I don’t heat up my dinner. The system compromised of myself and the microwave heats it up. And I offload the need to track the time to cook on to my microwave’s timer. And that’s a perfectly valid way of looking at things. But what we’re interested in is places where interfaces don’t just make our lives more convenient. We’re interested in places where they systems comprised of us and interfaces are capable of doing more, specifically because those interfaces exhibit certain cognitive qualities. The systems might perceive, they might remember, they might learn, they might act on our behalf. In some way they’re all floating a cognitive task from us. And as a result, the system comprised of us and the interface, is capable of doing more. So reflect on that a bit, what is the place where the system comprised of you and some number of interfaces is capable of doing more than you alone? Specifically, because of the cognitive qualities that the interfaces possess.
Almost any interface on the computer can be analyzed from the perspective of distributed cognition but right now, I’m most interested in my email. My email is an unbelievable extension of my longterm memory because whenever I see anything in email, I know I don’t actually need to commit it to my own long-term memory. It’s there, it’s safe forever and if I ever need to find it again, I’ll be able to find it. Now, finding it might take some work sometimes, but rarely as much work as manually remembering it. For me, I also mark messages as unread if I’m the one they’re waiting on, or if I need to make sure I come back to them. And so, my email is an external memory of both all my communications via email, and tasks that are waiting on me to move forward.
Distributed cognition is concerned with how the mind can be extended by relations with other artifacts and other individuals. Because we’re interface designers, we probably focus most of our time on the artifacts part of that. After all, even though we’re designing tasks, the artifacts are what we’re actually creating that’s out in the world. But the other part of distributed cognition, distributing across individuals presents a powerful opportunity as well. This used to be far more important, actually. Before the days of GPS navigation, a different form of navigation assistance existed. It was your spouse or your friend sitting in the passenger seat, reading a map and calling out directions to you. And while mobile devices and artificial intelligence may have replaced humans in some such systems, there are still lot’s of places where the role of distributing across humans is crucial. Here’s an example of this in action today. At Udacity, we use a tool for managing projects called JIRA. It breaks down projects into multiple pieces that can be moved through a series of steps and assigned to different responsible individuals. The entire value of JIRA is that it manages distributing tasks across members of a team. Thus, when a project is completed, it is completed by the system comprising the individual team members and JIRA itself.
The social portion of distributed cognition is concerned with how social connections create systems that can, together, accomplish tasks. So for example, you and your friend sitting in the passenger seat, together form a system capable of navigating to a new destination without a GPS. But social cognition is not only concerned with how social relationships combine to accomplish tasks. It’s also concerned with the cognitive underpinning of social interactions themselves. It’s interested in how perception, memory, and learning relate to social phenomena. As interface designers though, why do we care? Well, in case you haven’t noticed, one of the most common applications of interface design today involves social media. Everything is becoming social. Facebook tries to tell you when your friends are already nearby. Udacity tries to connect you with other students working on the same material as you. Video games are increasingly trying to convince you to share your achievements and highlights with your friends. And yet, often times, our interfaces are at odds with how we really think about social interaction. Designing for this well involves understanding the cognitive underpinnings of social relationships. My Play Station, for example, has a feature for finding my real life friends, and then communicating to them my gaming habits. But really, I probably don’t want them to know how much I might play video games. If I come unprepared for recording today, I certainly don’t want Amanda to know it was because I was playing Skyrim for six hours last night. So if we’re going to design interfaces that integrate with social interactions, we have to understand how social interactions actually work. So an understanding of social cognition is very important if that’s the direction you want to take.
Let’s talk about challenge of designing for social relationships. I like to play video games. I’m friends with people from work. So it’s natural that I might want to play games with people from work. But at the same time, my relationship with people from work isn’t purely social. If they see I’m playing a game, maybe they say, hey, David’s got some free time. I should ask him to help me out with something. Or if they see I spend a lot of time playing video games, maybe they more generally say hey, David’s got plenty of time to take on a new tasks. How do we design a social video gaming system that nonetheless protects against these kinds of perceptions?
There are lot of creative ways we might tackle this problem. One might be a base social video game relationship around something like tender. Tinder, if this is still around by the time your watching this, is a dating app were you express interest in another’s in are only connected if they also express interest in you. We can apply the same colonoscopy to video games. You can set it such that My Contacts can’t just look up my game playing habits. But if they’re also playing or interested in playing, they’ll learn that I am playing as well. In terms of social cognition, that’s kind of getting at the idea of an in-group. Your behaviors are only seen by those who share them and thus are in no position to judge them.
Like distributed cognition, situated action is strongly concerned with the context within which people interact. But unlike distributed cognition, situated action is not interested in the long-term and enduring permanent interactions amongst these things. That’s not to say that the theory denies the existence of long-term memory, but just has a different focus. Situated action focuses on humans as improvisers. It’s interested, not in the kinds of problems that people have solved before, but in the kinds of novel situational problems that arise all the time. So, for example, this is the first time I’m filming with my daughter on camera. I don’t know how she’ll act. I don’t have contingency plans about how to react if she acts strangely or if she distracts me from my script. I’m just going to figure this out as I go along. This is the kind of interaction that situated action is interested in, and that’s important for us as interface designers. While we like to think we’re in charge of structuring task for our users, in reality, the tasks that we perform are growing out of this interaction. We can try our best to guide it in certain directions, but until we actually get our hands on it, the task doesn’t exist. The task of me filming with my daughter didn’t exist until this moment. Once we’ve got our hands on it, the task is what we do and not what we design. So, as users, when we use an interface, when we actually do something, we’re defining the task as they go along. So, there are three kinds of ways here. One, we must examine the interfaces we design within the context in which they’re used. Two, we must understand that the task that the users perform grows out of this interaction with our interfaces. We don’t define it. Three, we can try to structure it as much as we can, but until users get started, the task itself doesn’t exist. Once they get started, they play a significant role in defining the task.
Situated action gives us a valuable lens to examine issues of memory. We mention in our lessons on memory and on design principles that recognition is easier than recall. People have an easier time recognizing the right answer, or option when they see it rather than recalling it from scratch. That’s in part because memory is so context dependent. Recognition provides the necessary context to identify the right option. Relying on recall, means there’s little context to cue the right answer to the users memory. Now I encountered an interesting example of the value of situated action a little while ago. My mother just had surgery. And so I would often go over to help her out with things. And everytime I would go over, she’d have four, five favors to ask me. Inevitably I would forget a couple of those favors and have to be reminded, but she would always remember. Why was she so much better able to remember the favors than me? Does she just have a better memory? She didn’t make a list. She didn’t write them down or anything like that. So the distributed cognition perspective doesn’t find an external memory being used or anything like that. My hypothesis from the perspective of situated action, is that she has the context behind the tasks. She knows why they need to be done. She knows what will happen if they aren’t. For her, they’re part of a broader narrative. For me, they’re items on a list. I have no context for why they’re there. Or what would happen if they’re undone. For her, they’re easy to remember because they’re situated in a larger context. For me, they’re difficult because they’re isolated.
Lucy Suchman’s 1985 book “Plans and Situated Actions,” is the seminal book on the philosophy of situated action. The book is a detailed comparison between two views of human action. The first view she writes views the organization insignificance of action as derived from plans. This is the model we very often adopt when developing interfaces. Users make plans and users carry out those plans. But such then introduces a second view as well. In this view, people simply act in the world, and plans are what we derive from those actions. Instead of plans dictating actions, plans are interpretations of actions. What this means for us as interface designers is that rather than assuming the user has a plan in mind that they’re actively carrying out, we might consider viewing only their immediate interaction with the current screen instead. In other words, forget the history of actions that led the user to a certain screen and ask just, “Once they’re here, how do they know what to do next? Later in the book, Lucy Suchman specifically touches on communication between humans and machines. There’s a lot more depth here as well. The key takeaway for us is to focus on the resources available to the user at any given time, but I do recommend reading the book and this chapter for more insights.
Activity theory is a massive and well developed set of theories regarding interaction between various pieces of an activity. The theory as a whole is so complex that you could teach an entire class on it alone. It predates HCI. And in fact, activity theory is one of the first places the idea of interacting through an interface actually came from. In our conversations about HCI though, there are three main contributions of activity theory that I’d like you to come away with. First, when we discuss designing tasks and completing tasks through an interface, we risk missing a key component. Why? We could jump straight to designing the task, but why is the user completing the task in the first place? That can have significant implications for our design. Activity theory generalizes our unit of analysis from the task to the activity. We’re not just interested in what they’re doing, but why they’re doing it and what it means to them. Our designs will be different, for example, if users are using a system because they’re required to or because they choose to. Notice how this is similar to our discussion of distributed cognition, as well. In distributed cognition, we were generalizing the unit of analysis from a person, to a system of people and artifacts. Here, we’re generalizing the unit of analysis from a task to an activity surrounding a task. In both ways, we’re zooming out on the task and the design space. Second, activity theory puts an emphasis on the idea that we can create low level operations from higher level actions. We saw something similar to this with GOMS models, where methods were made up of operators. This has a special historical significance. Before activity theory and similar theories reached HCI in the 1980s, HCI was largely concerned with minute things, like how quickly a person can click a button or type in a command. Activity theory helped us zoom out from those low level interactions, those low level operators, to general user needs at the action or the activity levels. And third, activity theory points out that actions by the user can actually move up and down this hierarchy. A common example of this is driving a car. The first time you drove a car, shifting gears between park and drive was a very conscious action made up of operators like grabbing the gear shift and moving it in the right direction and letting go. You had to think about how to press the button, which way to push the stick, and when to release it. However, after driving a few times, shifting gears just becomes second nature. It becomes more like an operator. It shifted from being a conscious goal to an operator in your broader driving behavior. Notice a similarity here to our previous discussion on learning curves. How quickly an action moves from being a conscious action to a subconscious operator is also a function of how good the learning curve is on that design. Notice also, this is similar to the question of invisible interfaces. A good invisible interface helps users focus on their actions inside the task, rather than the operators they use to interact with the system.
In 1996, Bonnie Nardi edited a prominent book on study of contexts in human-computer interaction, titled “Context and Consciousness”. The entire book is worth reading but two papers in particular stands out to me, both by Nardi herself. The first is a short paper that serves in some ways as an introduction to the book as a whole. It’s not a long paper, only four pages, so I highly recommend reading it, it won’t take you long. Here, Nardi outlines the general application of activity theory to HCI. She notes that activity theory offers a set of perspectives on human activity and a set of concepts for describing that activity. This is exactly what HCI research needs as we struggle to understand and describe context, situation, and practice. She particularly notes that the theory is uniquely suited to addressing some of the interesting issues facing HCI in 1996. For that reason, it’s also fascinating to view from a historical perspective. Today, we understand the role that context has grown to play, especially with emerging technologies. It’s fascinating to me to look back at how the community was constructing that debate 20 years ago.
In this lesson, we’ve covered three theories on studying contexts in human-computer interaction: distributed cognition, situated action, and activity theory. If you’re having trouble keeping the three straight though, Nardi has a great paper for you. From her volume context and consciousness, Nardi wrote a comparison between the three philosophies she titled “Studying Context: A Comparison of Activity Theory, Situated Action Models, and Distributed Cognition.” She starts by giving a great one-page summary of each of these three views, which would be really good if you’re having trouble understanding the finer points of these theories. Even more usefully, she goes on to give commentary on the difference between these three big theories. First, she notes that activity theory and distributed cognition are driven by goals, whereas situated action de-emphasizes goals for a focus on improvisation. She goes on to summarize the situated actions as goals are constructed retroactively to interpret our past actions. Nardi also evaluates the role of permanent persistent structures, noting they’re important for activity theory and distributed cognition, but present a tension for situated action. So, here, we again see a similarity between activity theory and distributed cognition. So, what makes them different? Well, Nardi writes that the main difference between activity theory and distributed cognition is their evaluation of the symmetry between people and artifacts. Activity theory regards them as fundamentally different, given that humans have consciousness. Distributed cognition by contrast believes that artifacts can serve cognitive roles, and so those should be considered conceptually equivalent to humans. So, that gives a high-level overview of the difference between these three theories. These theories are big and complex, of course, and the complete paper goes into much more detail. But this should provide a decent glimpse at the distinctions at least enough to get you started reading the paper for yourself.
Distributed cognition is a perspective on analyzing systems that helps us emphasize the cognitive components of interfaces themselves. It helps us look at things we designed as extensions of the user’s own cognition. We can view anything from notes on a desktop to the entire Internet as an extension of the user’s own memory. We can view things like Gmail’s automatic email filtering as offloading cognitive tasks from the user. In looking at things through this lens, we focus on the output not just of people into interfaces but on the combination of people and interfaces together. So, what are they able to do together that neither of them could do individually? As we close this lesson, think about this in terms of your chosen areas of HCI. What are the cognitive components of the areas with which you’re dealing? How do augmented reality and wearable devices offload some of the users cognition onto the interface? In educational technology or in HCI for healthcare, what are the tasks being accomplished by the systems comprised of users and interfaces?
In this lesson, we’ve talked about distributed cognition and a couple of related theories. The commonality of all these theories was their emphasis on context and integrated systems. Distributed cognition is interested in how cognition can be distributed among multiple individuals and artifacts all working together. By taking a distributed view of interface design, we can think about what the combination of our users and our interfaces are able to accomplish. Situated action and activity theory give additional perspectives on this. Focusing respectively on the importance of ad hoc improvisation and the need to emphasize users’ motives beyond just their goals. The common ground for all these theories is that our interfaces are not simply between the user and their task, but they also exist in some kind of greater context.