Public webpage for sharing information about Dr. Joyner's CS6750 - Human Computer Interaction course in Spring 2022.
When you start to design a new interface, your goal is to design an interface that will meet the needs of the user better than the existing design. It’s very rare that we design for tasks that users have never even performed before. We’re almost always developing new ways to accomplish old tasks. Facebook, for example, was a new way to socialize and communicate with others, but the fundamental activities of socializing and communicating weren’t new. Facebook just met those needs better or, at least, met certain needs better depending on who you ask. In order to design interactions that are better than existing designs, it’s important to take into consideration the user’s needs at every stage of the design process. That’s what this unit of this course will cover. We don’t generally want to build creative novel interfaces just for the sake of creativity or novelty. We want novelty to have a purpose and to achieve that purpose, it must be used with a strong understanding of the user’s task. So, in this unit, we’ll cover the design life cycle as well as methods for gathering feedback and information from users at every stage of the cycle. However, before we get started on that, we need to set up some basic concepts we use throughout our discussions. We’ll start by discussing user-centered design and then we’ll introduce the four-stage design life cycle. We’ll discuss a few different general methods for pursuing the design life cycle. Then finally, we will discuss the two kinds of information or data that we gather, qualitative and quantitative.
User-centered design is design that considers the needs of the user throughout the entire design process. As far as we’re concerned, that’s pretty much just good design. But oftentimes, that isn’t the way design is done. Design has often done to meet some functional specification of what the tool must technically be able to accomplish, instead of considering the real needs of the user. Or sometimes people will go through an entire design process, believing they understand the needs of the user without ever really checking. User-centered design is about prioritizing the user’s needs, while also recognizing that we don’t know the user’s needs. So, we need to involve them at every stage of the process. Before we start, we need to examine the user’s needs in depth both by observing them and by asking them direct questions. After we start designing, we need to present our design alternatives and prototypes to the user to get feedback. When we near a design that we like, we need to evaluate the quality of the design with real users. Having a good working knowledge of HCI principles helps us go through this more quickly, but we can’t design great interfaces just by applying guidelines and heuristics alone. We have to interact with our users, understand their needs, and involve them in the evaluation.
The International Standards Organization has outlined six principles to follow when pursuing user-centered design. Number one, the design is based on explicit understanding of users, tasks, and environments. That means we must gather information about the users, the task they performed, and where they perform those tasks. We need to leverage that knowledge throughout the design process. Number two, users are involved throughout design and development. Involvement can take on many forms from regularly participating with interviews and surveys about our designs to actually working on the design team alongside the designers themselves. Number three, the design is driven and refined by user-centered evaluation. We absolutely must have real users evaluating the prototypes and interfaces that we assemble. Number four, the process is iterative. No tool is developed once, released, and then abandoned. Designs undergo constant iteration and improvement even after being released. Number five, the design addresses the whole user experience. Many designers are tempted to delineate a certain portion of the experience as their primary interest, but we must address the entire user experience. Number six, the design team includes multidisciplinary skills and perspectives. Good teams for pursuing user-centered design include people with a number of different backgrounds like psychologists, designers, computer scientists, domain experts, and more. So, keep these principles in mind when you’re doing user experience design.
When we talk about user-centered design, we throw around the word user as if it’s pretty obvious what it means. The user is the person who uses the interface that we create, right? However, that’s not the only person in whom were interested. There are multiple stakeholders in this design, and we want to explore how our design is going to affect all of them. The user themselves is what we call the primary stakeholder. They’re the person who uses our tool directly. Secondary stakeholders are people who don’t use our system directly, but who might interact with the output of it in some way. Tertiary stakeholders are people who never interact with the tool or even interact with it’s output, but who are nonetheless impacted by the existence of the tool. So let’s take a couple of examples of this. Imagine we’re designing a new grade book tool that makes it easier for teachers to send progress reports to parents. Teachers would interact with the tool, inputting grades and feedback. And so teachers would be our primary stakeholders. Parents receive the output of that tool. Parents receive the progress reports. And so they’re secondary stakeholders. They interact with the output of the system, but not with the system itself. Students don’t use the system at all, and maybe they don’t even see the progress reports unless parents decide to share them. But they’re nonetheless affected by the existence of this system. So they’re tertiary stakeholders. School administrators might be another stakeholder, but where they fall in this would differ based on how the school sets up the relationship. If they can use the tool to directly monitor and intervene in student progress, they might be primary stakeholders. If they just see aggregated progress reports so they can monitor things, they might be secondary stake holders. If they never interact with the system in any way, they’re nonetheless likely affected by the fact the system is there. And so they’d be tertiary stakeholders. In designing this tool, we need to keep all three kinds of stakeholders in mind. For example, how does parents having more consistent access to great information affect the students? It might foster increased involvement by parents, but it might also facilitate helicopter parenting, where parents are too controlling over their kid’s school work and prevent them from developing the necessary metacognitive skills and self discipline that they need to succeed later in life. User-centered design isn’t just about catering to the user in the center of this diagram, but it’s also about looking at the impact of our designs on all the stakeholders.
You might actually come from a software engineering background, and so while user-centric designs sounds obvious to some people, you might have experienced the other side of the coin. In many industries and domains, software engineers are still left to design user interfaces themselves. There’s a fantastic book about this called The Inmates Are Running the Asylum, by Alan Cooper. Where he compares technology to a dancing bear at a circus. He notes that people marvel at the dancing bear, not because it’s good at dancing, but because it dances at all. The same way, people marvel at certain pieces of technology not because they work well, but because they work at all. The book was released in 2004 and since then the user has become more and more a focal point of design. And yet there is still places where individuals with little ACI background are designing user-facing interfaces for one reason or another. Since it’s a stronger chance you’ve worked in software engineering, reflect on that a bit. Have you seen places where software engineers, data scientists, or even non-technical people were put in charge of designing user interfaces? How did it go?
I encountered this in my first job, actually. Somewhere between my freshman and sophomore years at Georgia Tech, I had a job as a user interface designer for a local broadcast company. I designed an interface, then I handed it over to a team of engineers for implementation. Late in the process, the requirements were changed a bit and new configuration screen was needed that the engineers just went ahead and looked up. We got the finish tool and it all worked beautifully and perfectly, except for this configuration screen. It was a list of over 50 different settings, each with 3 to 5 radio buttons to the side. Each setting was a different length. Each radio button label was a different length. There was kind of placed all over the campus. There was no grid. It was illegible. It was unusable, but it was technically functional. It met the requirements described in terms of what the user must be able to do, just not how usable it was. Fortunately, there’s a greater appreciation of the value of user-centered design now than there was then. So, many spaces have become so crowded that the user experience is what can really set a company apart. I’ve actually been noticing a trend lately toward new user experiences around really old tasks. I use an app called stash to buy and sell small amounts of mutual funds. Buying mutual funds has been around forever and E-Trade has even been doing that online for a long time. What differentiates stash is the new user experience. Automated investing, simple tracking, simplified summaries. User experience design really has become a major differentiator between success and failure.
User centered design is about integrating the user into every phase of the design life cycle. So, we need to know two things. What the design life cycle is and how to integrate the user into each phase. Now if you look up design life cycles you’ll find a lot of different ideas. We’re going to discuss in terms of a four phase design life cycle that’s pretty general and probably subsumes many of the other ones you’ll find. The first part of this is Needfinding. In Needfinding, we gather a comprehensive understanding of the task the users are trying to perform. That includes who the user is, what the context of the task is, why they’re doing the task, and any other information related to what we’re designing. Second, we develop multiple design alternatives. These are very early ideas on the different ways to approach the task. It’s important to develop multiple alternatives to avoid getting stuck in one idea too soon. The third step is prototyping. We take the ideas with the most potential and we build them into prototypes that we can then actually put in front of a user. Early on we might do this in very low fidelity ways like with a paper and pencil, or even just verbally describing our ideas. But as we go on we refine and improve. Fourth, and most importantly, we perform user evaluation. We take our ideas that we prototyped and put them in front of actual users. We get their feedback, what they like and what they don’t like, what works and what doesn’t work, and then the cycle begins anew. The feedback we gain from the users, as well as our experience with these prototypes and design alternatives, improves our understanding of the problem. We now know new areas of the problem we might need to explore with some more need finding. Now we might have some new ideas for design alternatives, or some new ways of expanding the designs that we already have. We then take those things and use them to improve our prototypes. Either prototyping new ideas that we didn’t have before, or making our prototypes more rigorous and more polished, so that we can get even better evaluation. And then we put them in front of users again. Each time we go through this cycle our understanding improves, our designs improve, and our prototypes improve. Eventually our prototypes develop to the point of being designs ready for launch, but the cycle doesn’t end there. We keep iterating, now with live users doing the evaluation.
At every stage of this design life cycle, we’re interested in gathering information from the user to better inform our designs. To do that, we need a number of methods to actually obtain that information. Fortunately, there are a number of methods we can employ to try to gather the information we need. And in fact, the majority of this unit will go through all these different methods. These will become the tools in your toolbox. Things you can call upon to grab the information you need when you need it. Not the many of this method are so well-developed that you can cover them in the entire units, or even entire courses. For example, we’ll spend about three or four minutes talking about naturalistic observation. And yet, there are entire textbooks and courses on how to do naturalistic observation really well. The goal of this is to give you enough information to get started and enough to know what you need to explore next. Remember, one of the original goals of this class was not just to understand more HCI but also to understand how big the field of HCI actually is.
When we talk about feedback cycles, we talk about how they’re ubiquitous across nearly every field. And each CI itself, isn’t any different. In a feedback cycle, the user does something in the interface to accomplish some goal. And then judges based on the output from the interface, whether the goal was accomplished, then they repeat and continue. In HCI, we’re brainstorming and designing interfaces to accomplish goals. And then based on the output of our evaluations, we judge whether or not the goals of the interface were actually accomplished, then we repeat and continue. In many ways, we’re doing the same things that our users are doing, trying to understand how to accomplish a task in an interface. Only in our case, our interface is the tools to build and evaluate interfaces and our goal is to help them accomplish their goal.
There is one final distinction we need to understand going forward because it’s going to come up at every stage of the design lifecycle, qualitative versus quantitative data. At every stage of the design lifecycle, we’re interested in gathering data from users. Early on that might be descriptions of what they do when they’re interacting with a task. Or it might be measures of how long certain tasks take to complete, or how many people judge a task to be difficult. Later on, though, it might be whether or not users prefer our new interfaces or how much better they perform on certain tasks. Now data will always fall into one of two categories, qualitative and quantitative. Quantitative is probably the easier one to describe, quantitative data describes anything numeric. When data is summarized numerically, we can perform statistical tests and summaries on it, draw formal conclusions, and make objective comparisons. Now there are a lot of strengths of quantitative data, but those strengths come in large part because quantitative data only captures a narrow view of what we might be interested in examining. It’s very strong for a very small class of things. Qualitative data covers pretty much everything else. Qualitative Data covers descriptions, accounts, observations, it’s often in natural language. It could be open ended survey responses, or interview transcripts, or bug reports or just your personal observations. Because of its flexibility, qualitative data gives us a much broader and more general picture of what we’re examining. But the cost is that it’s hard to generate formal conclusions based on qualitative data. Qualitative data may be more prone to biases. Now in some circumstances, we can convert qualitative data into quantitative data. For example, we could count the number of surveys respondents to an end of course survey who mentioned course difficulty in their free response questions. Now the free response question would be qualitative data but numerically summarizing it generates quantitative data. Generally speaking, though, quantitative data and qualitative data serve different purposes in a design life cycle. I’ve heard it described quantitative data provides the what, the qualitative data provides the how or the why. When performing need finding or when doing some initial prototype evaluations, we’re likely interested in users’ qualitative descriptions of their tasks or their experiences with the interface. It’s generally only after a few iterations that we start to be interested in quantitative analysis, to find numeric improvements or changes. We can also use these in conjunction with one another, collecting both quantitative and qualitative data from the same participants. That’s referred to as a mixed method approach, it’s a mix of qualitative and quantitative data to paint a more complete picture of the results.
Let’s do a quick exercise on quantitative versus qualitative data. Let’s imagine we’re doing end of course evaluations for some class. For each of the following types of data, mark whether it would be considered quantitative or qualitative. You can skip ahead, if you don’t want to listen to me read all these out. We have responses to on a scale of 1 to 5, rate this course’s difficulty, responses to how much time did you spend per week on this course, responses to what did you like about this course, count of students Students mentioning office hours to the above questions, percentage of students that completed the survey, responses to a forum topic requesting non-anonymous course reviews, the number of participants in a late-semester office hours session, and the transcript of the conversation of that late-semester office hours session
Quantitative data is numeric. Any of these things that can be measured numerically, really qualify as quantitative data. Many of these things are things that we can just count. We count the number of students mentioning office hours. We count the number of participants. Here we basically count the number of students that completed the survey and divided by all the students in the class. And here we have them count how many hours per week they think they spend in the course. The first option can be a little bit tricky. On a scale of one to five in a numeric scale, but because students have to choose one, two, three, four or five, it’s not a continuous numeric scale. For example, we have no way of guaranteeing the student sees the difference between a three and a four as the same as the difference between a four and a five. The types of analysis we can do on the first kind of data, are more limited. But nonetheless it’s still is measured numerically, so it still is quantitative data.
Within the general category of quantitative data, there are a number of different subdivisions. These are important because they inform what kind of conclusions we can generate and what kind of statistical test we can perform. We’ll talk about statistics more when we talk about evaluation, but it’s important to understand what type of data you’re gathering during all phases of the design lifecycle. So, first, there are four major distinctions in quantitative data: nominal, ordinal, interval, and ratio data. Nominal data is also referred to as categorical data. It arises when we observe the number of instances of different categories. So, for example, if we asked a bunch of users, how do you typically commute to work? These would each be different categories. Our data would be the number of people that fell into each category. Ordinal data is similar to nominal data, but there’s some explicit ordering to those different categories. So, for example, we might ask on a scale of 1-5, how would you rate your level of satisfaction with your commuting method? The user would choose highly dissatisfied, dissatisfied, neutral, and so on. These are still categories, but there’s an ordering to them. But, what’s important here is that we don’t actually know how big the gaps between these categories are. It could be that users perceive only a small difference between neutral and satisfied or only a small difference between satisfied and highly satisfied. They might perceive the difference between satisfied and highly satisfied as smaller, for example, than the difference between dissatisfied and highly dissatisfied. That’s where interval data comes in. With interval data, we do know the exact difference between values. Imagine if we asked commuters at what time do you leave for work in the morning? We know that the difference between 6:00 AM and 8:00 AM is the same as the difference between 8:00 AM and 10:00 AM. So, the interval between the different options is the same, but that can still lead to some strange comparisons. It’s four hours later, but just because eight is twice as much as four, it doesn’t mean 8:00 AM is twice as late as four 4:00 AM. We know the intervals between the numbers, but we don’t know their ratios. We often talk about this in the context of temperature. Sixty-four degrees isn’t twice as warm as 32 degrees even though 64 is twice 32. With interval scales, there is no zero-point, only intervals between numbers. That brings us to the fourth kind of quantitative data, ratio data. Ratio data is just like interval data except it has a zero point. So, that means we can actually use ratios. If we ask our commuters how long it takes them to get to work, we know that someone who says 30 has twice as long a commute as someone who says 15, and 15 is twice as long as 7.5, and so on. So, those are our four general types of quantitative data. But, there are more subcategories that come up among these two. For example, with nominal data, we can have single nominal and multi-nominal data. Our question here is can a person being more than one category at the same time? Earlier when we had this question, we phrase it as single nominal. We ask users to choose only one of these options, but it would be entirely possible to have users who take different forms of transportation on a regular basis. So, why force him to choose only one. With multi-nominal data we might let them choose more than one option. So, any given user can fall into multiple categories. If you read about quantitative data, you’ll see that ordinal data can also be-multi ordinal. But, to be honest, I’ve never seen a compelling example of when that might happen, so I’m not going to show one here. With nominal data, we can also have binary and non-binary data. This distinction is a little bit more straightforward. The example we’ve already shown is non-binary. There’s more than two different options. Binary nominal data has only two categories. This might seem like a silly distinction, but it’s actually important because the type of statistical tests we do with binary data are a little bit different from the types of statistical tests we do with non-binary data. This distinction between binary and non-binary data can also apply to ordinal data. Non-binary ordinal data is pretty straightforward. It’s a one to five scale or a one to seven scale or anything like that. But, we can still have ordinal binary data. For example, what grade did you earn in driver’s ed. We would interpret an implicit ordering between Fail and Pass. So, that would make an ordinal data, but there’s still only two options, so it’s binary data. With both interval and ratio data, there’s also a distinction between discrete and continuous. Discrete means it’s something countable. So, if we ask how long is your average commute in minutes, we’re generally expecting people to give an answer in a whole number. We’re not expecting someone to say it takes 15.245 minutes. So, that would be discrete. It’s a countable number of minutes. If, on the other hand, we were to put sensors in people’s cars and actually time how long their commutes are, we would arrive at continuous data. We would get values like it takes on average 31.2513 minutes for you to commute, and just like that distinction between binary and non-binary data, these distinction sometimes informs the type of statistical tests we’re supposed to use. So, these are general categories of quantitative data as well as some smaller distinctions that might be relevant. In practice I can share that actually I find relatively few times in our work where we use interval data. Using when we’re dealing with real numbers, we’re dealing with numbers that have a zero-point like time or number of occurrences. Similarly, the type of statistical tests that we use for interval and ratio data are almost always the same. We similarly use some of the same tests for nominal and ordinal data. But, I’m getting ahead of myself a little bit. For now, what’s most important is just to understand what kind of data you’re collecting.
Just as there are different types of quantitative data, so also there are different types of qualitative data. In practice though, the types of qualitative data we generate are usually much more closely integrated with the way in which they were generated. For example, we could have discrete ratio data that comes from self-reported responses on a survey, or naturalistic observations in the wild, or logs from real interfaces. Qualitative data, on the other hand, tends to be determined more by how it’s gathered. Some common types of qualitative data we’ll deal with are transcripts from things like interviews and focus groups, field notes we might take during naturalistic or participant observation, artifacts like reviews left for existing interfaces or the existing interfaces themselves, and lots, lots more. Quantitative data is numeric data, and qualitative data is everything else. So, the span of possible qualitative data is a lot larger. The qualitative data is strong because these types of data provide a much richer picture of what we’re investigating. But with that strength comes some costs. Qualitative data is also more expensive to analyze, and it’s also a lot more prone to interpretation biases. So, for that reason, we often convert qualitative data to quantitative data for the purpose of analysis. We do this through a process called coding. Now here, coding isn’t writing Python code or doing anything like that. Coding is a process of taking free foreign qualitative data and boiling it down into some numeric categories. Typically, these categories are nominal, which allows us to do whatever kind of statistical test we can do on nominal quantitative data. During this process, we lose some of that rich data. But what we gain is a numeric representation that we can use in new ways, and plus we didn’t really lose that original data. It’s just not part of the output of this coding process. So, we can still go back and look at it if we need to. The other thing that comes out of this process is a documented methodology for actually coding the qualitative data to quantitative data. So, if anyone asks where we got these numbers, we can actually show them the process we went through. That helps us argue that our interpretations are not just biased interpretations of what happened, but they are actually rigorous analyses based on that qualitative data. In HCI., we almost always want to use some mix of these two. Our problems are far too rich to address with quantitative data alone, but for that same reason, we risk being dominated by biases if we focus solely on qualitative analysis.
As we go through the unit in this course on methods, we’re going to take a running example of the design challenge to explore the different HCI research methods. I’m going to choose a challenge quite close to home for me, improving the MOOC recording process.
Over the past several lessons, you’ve been exploring how the design life cycle applies to the area of HCI that you chose to explore. Now that we’ve reached the end of the unit, take a moment and reflect on the life cycle that you’ve developed. How feasible would it be to actually execute, what would you need? What users do you need? How many? When do you need them? There are right answers here, of course. Ideally you’ll need users early and often. That’s what user-centered design is all about. In educational technology, that might mean having some teachers, and students, and parents that you can contact frequently. In computer-supported cooperative work, that might mean having a community you can visit often to see the new developments. In ubiquitous computing, that might mean going as far as having someone who specializes in low fidelity 3D prototypes to quickly spin up new ideas for testing. Now that you understand the various phases of the design life cycle, take a moment and reflect on how you use them iteratively and as a whole in your chosen area of HCI.
In this lesson, we introduced the concept of the Design Life Cycle. There are lots of versions of the Design Life Cycle out there, but we’re going to focus on the most general four-step cycle. The cycle starts with need-finding, then goes to constructing design alternatives, and to prototyping, and then the user evaluation, and then the cycle repeats. The goal of this cycle is to realize user-centered design. Design that takes into consideration the user’s needs at every step. In the rest of this unit, we’re going to focus on filling up your Design Life Cycle toolbox with tools for gathering the right kind of data at the right time and using it in the right way.
[MUSIC] Before we start working with real users there are a few ethical considerations we have to make. If you are doing a research as part of a university these are part of your contract with the university to do research on their behalf. Even if you are doing research independently for an industry there are still ethical obligations to follow. These considerations are important not only to preserve the rights of our users, but also to ensure the value of the data that we gather. In this lesson, we’ll talk a little bit about where these kinds of ethical considerations came from. Then we’ll talk about some of the basic ethical considerations that we need to make. We’ll also talk about Institutional Review Board, or IRB, the university organization that governs human subjects research.
In the first half of the 20th century, a number of pretty clearly unethical human subjects experiments took place. Many of them were conducted by scientists working for the Axis powers during World War II. But famously, many were also conducted right here in our own backyard, here in the United States. Among them were Milgam’s obedience experiment where participants were tricked into thinking that they had administered lethal shocks to other participants to see how obedient they would be. There was the Tuskegee syphilis study where rural African American men were intentionally injected with syphilis to study its progression and there was the Stanford prison experiment where participants were psychologically abused to test their limits or test how they would act under different circumstances. In response to all this, the National Research Act of 1974 was passed, which led to the creation of institutional review boards to oversee research at universities. The Belmont Report further summarizes basic ethical principals that research must follow in order to receive government support. Among other things, the law dictated that the benefits to society must outweigh the risks to the subjects in the case of these experiments. It also dictated that subjects must be selected fairly, which was a direct response to the Tuskegee syphilis study. And perhaps most importantly, it demanded a rigorous informed consent procedures. So, the participants know what they’re getting into and can back out at any time. These efforts all attempt to make sure that the positive results of research outweigh the negatives and that participant rights are always preserved.
In this lesson, we’re largely focusing on the practical steps we go through to get approval for human subject’s research. But before we get into that, I want to highlight that this isn’t just a bunch of bureaucratic steps necessary to make sure the people are treated ethically at all stages of research, IRBs main task is to make sure the potential benefits of the study are worth the potential risks. So as part of that, part of the role is to make sure the potential benefits are significant. A lot of the steps of the process are going to ensure that the data that we gather is useful. So for example, the IRB is sensitive about the perception of coercion. When participants feel coerced to participate in research, the data they actually supply may be skewed by that negative perception which impacts our results. Similarly, we might design studies that have some inherent biases or issues to them. We might demand too much from participants or ask questions that are known to affect our results. Much of the initial training to be certified to perform research is similarly not just about doing ethical research but also about doing good research. By recording who is certified, IRB helps ensure that research personnel all understand the basics of human subjects research. IRB is also there to monitor for these things as well and many of the steps of this process ensure that the research we perform is sound and useful. After all, if the research we perform is not useful, then even the smallest risks will outweigh the nonexistent benefits.
If you’re going to be doing a research just part of a University project or University class, you need IRB approval. Different universities have different processes and policies for getting started with IRB. We’re going to discuss the Georgia Tech policies and sets where these class is based but you should check with your university if you’re from somewhere else to make sure you’re following the right policies for your school. To get started, we need to complete the required training. So here I’m showing the IRB website, which is researchintegrity.gatech.edu/irb. And to get started, you need to complete your required training. So click required training over on the left. This’ll take you to a page that overviews the IRB required training and gives you a link of the left to begin your CITI training. So click that, then login to your Georgia Tech account. And you’re going to want to complete Group 2, Social and Behavioral Research Investigators and Key Personnel. I can’t show you exactly what it looks like to sign up fresh because I’ve already completed it. But you should be able to add a course and add that as your course. After you’ve completed CITI training you’ll receive your completion report and then you’ll be ready to get started with IRB.
After you’ve completed any necessary training you can access the IRB application for your own university. We’re doing this in terms of Georgia Tech, so here’s the tool we used called IRBWISE. Here under my protocols, you’ll see a list of all of the protocols to which you’re added. A protocol is a description of a particular research project. It outlines the procedures that the IRB has approved regarding consent, recruitment, experimentation and more. Here we see approved protocols. Protocols that are new and haven’t been submitted yet. Protocols that are waiting for the IRB to act on them. And amendments to protocols. Amendments are how we change earlier protocols to add new researchers or change what we’re approved to do. After a protocol is approved, any changes must be submitted to the IRB as an amendment to be accepted separately.
Generally speaking, you might not ever be asked to complete a protocol yourself. You might instead just be added to an existing protocol. Still, you need to make sure to understand the procedures outlined by any protocol to which you’re added, because they still govern what you do. So we’ll run through the process of creating a protocol, but this will also cover the details to know about any protocol to which you are added. So for this, I have a draft protocol covering our user study on people who exercise. Every protocol starts with the title of the protocol, and some certified personnel. These are required just to save the protocol. We add approved research personnel on the study personnel page. We would enter their name, select their role, and if their certification isn’t already in the system, we would attach it here. After adding them, they’ll appear down here. The primary investigator, PI, must always be added first and it must be a faculty member. The protocol description covers the study at a high level. This should briefly touch on what will be done, what the goal of the research is and what subjects will be asked to do. It doesn’t cover all the details, but it covers enough for someone to understand generally what we’re going for with this study. Under the research design and methodology section, we describe our research. First, we describe the research design in the methodology. With human subjects research, this focuses on what users will experience and in what order. It also covers some experimental details like how subjects might be assigned different experimental conditions. Then we describe the duration of subject participation to make sure subjects aren’t being asked to do too much. Depending on what we’re doing, we may need to provide data collection methods. This includes things like surveys, pre-tests and post-tests interview scripts, anything pre-prepared to elicit data for the participant. Then we also need to fully describe the potential benefits of the study. Remember, IRB is about making sure that the benefits out way the risks. If there are no benefits, then the benefits can’t out way the risks. Benefits don’t need to be to the individual participants though, but they could be to the greater community as a whole. Similarly, we also needed to describe the risks associated with the study. For usability studies, very often our risks are not going to be very significant. Our risks might be those associated with normal work at a computer. But we still need to address this to acknowledge that we thought about what risks might arise to participants. Then we describe the plan for the statistical analysis if we have one. Qualitative research might not have a statistical analysis plan, so that’s why I’ve left this blank. Finally, we need to describe start and end dates of the research. Very often, this will break the research into a data collection phase and a research phase, where we actually analyze the data that we collected. Now we won’t generally need to worry about many of the remaining options on this form, because we’re not doing clinical studies and we generally don’t have external funding unless you’re working on a professor’s research project. So now let’s move on to subject details.
Because we’re interested in human-computer interaction, we almost certainly will have human-subject interaction. So when it asks, will the research involve direct interaction with human subjects, we would click, yes. That will bring us to a screen where we describe our subjects and the data we plan to collect from them. First, they want to know how many subjects we have and what genders. They want to make sure that we’re not wasting participants’ time if we’re not going to actually analyze all their data. And that we’re being fair to the different genders. A common problem in early research was over-representing male populations. Second, they’ll want to know if we’re targeting any vulnerable populations. People that might not have the capacity to give true informed consent. If we do, we need to make special accommodations to make sure they’re fully aware of their rights as participants. Third, they want the scientific justification for the number of subjects to enroll in our study. Like I said, they want to make sure that we’re not going to waste the time of a bunch off participants and then just throw their data out. That wouldn’t be very respectful for our participants’ time. If we’re doing statistical tests this might be the number of participants necessary to find a certain effect size, which we’ll talk about when we talk about quantitative research. If we’re doing qualitative research, this is usually a smaller number and is more based on how many different perspectives we need to get a good picture of what we’re trying to analyze. Alternatively, we might have external limits on our number of participants. For example, if you’re doing classroom research, your maximum number of students is the maximum number of students in the class. Next, we state the inclusion and exclusion criterion. The inclusion criterion are, who we specifically including, who’s our targeting audience? The exclusion criterion are those that we’re specifically excluding. Often times, one of these will just be the inverse of the other, but there may be times when we need to be more specific. For example, if we were doing research on undergraduate computer science education, our inclusion criteria might be undergraduate students. But our exclusion criteria would be undergraduate students that have previously taken a computer science class. As before, we can skip the questions, for our purposes at least, that are more based on clinical research. But at the bottom, we need to provide the steps necessary to ensure additional protection of the rights and welfare of vulnerable populations. For example, if we’re working with 10 year olds, how do we make sure 10 year olds really understand that they really do have the right to opt out of this research? We need to provide a plan for that here if we’re working with a vulnerable population. Finally, we also need to describe our recruitment procedures. How are we going to find our subjects? First, we’ll note what we’ll say and how we’ll communicate with them here. If we’re using the Georgia Tech subject pool, which is a system for finding research subjects within the Georgia Tech student body, we’ll indicate so here. And last, we’ll note the kind of compensation we plan to provide the participants. Many times we won’t compensate our participants at all. But if we’re doing a bigger research study that actually has some external funding, it’s pretty normal to give them some sort of monetary compensation for participation. It’s we’re using the George Tech subject pool, our compensation will often be extra credit in a certain class, very often a psychologist class. Note that the recruitment procedures are very important, especially if you’re doing something like classroom research, where there can be a very significant perception of coercion to get people to participate. If I, as an instructor, am recruiting students in my own class to participate in my research project, I have to be very clear that it won’t come back to haunt them if they choose not to participate.
One of the most important elements of IRB approval is consent, that was one of the things created by the Belmont Report. If we’re doing any interaction with our human subjects, we definitely need to think about consent procedures. On the consent information page, first, we need to indicate what kind of consent we’ll receive. Most commonly, this will be written consent required. In this case, participants will sign or digitally sign, a consent form to start the study, but in some cases a waiver may be obtained. First, a waiver of consent can be obtained under certain, pretty narrow circumstances, this generally means we don’t need to receive the subject’s consent at all. Most of the time this only applies when subjects will not be directly affected by the research. So for example, if we wanted to study educational or health data that’s already been generated and is already anonymized, we might receive a waiver of consent. Because those subjects won’t be impacted by the fact we’re now researching their data. Similarly, if we were to go sit in a coffee house and just take notes on the order taking process in a way that didn’t identify anyone. We might receive a waiver of consent, because our observation is not affecting those people. We might also receive a waiver of documentation of consent. This occurs for low risk research, where the written consent itself, would be the only record of the participants identity. This applies to a lot of survey research or unrecorded interviews, where participants can be informed of their rights at the start, and their own continued participation constitutes continued implied consent. There’s no reason to have them sign a consent form, because that consent form is the only reason we’d ever be able to identify them after the study. After selecting our option, we need to provide a justification, if we requested a waiver. If we didn’t, then no justification is necessary. We’ll then describe the plan for obtaining informed consent. Generally, this will be to provide the consent form to participants at the start of a study, and make it very clear that they can withdraw from the study at any time. If we’re involving children, non english speakers or other at risk populations in our study, there may be some additional boxes to complete. It’s also important for us to assess whether participants are continuing to consent to the study. Often times, we do this by making it very clear at the start of the study that they can withdraw at any time. So that their continued participation constitutes implied continued consent. Finally, it’s also possible to have protocols where deception or concealment is proposed. In HCI, for example, we might want to tell participants that an interface is functioning even if someone behind the scenes is actually just making it look functional, so that we get good research data out of those participants. For example, if we were testing something like a new version of Siri, we might tell participants that it’s functioning, when in reality someone is writing the responses by hand. If we’re using deception or concealment like that, we’ll indicate so here. Then finally, we need to upload our consent forms. At Georgia Tech the Office of Research Integrity Assurance provides consent form templates that we can tweak to match the specific needs of our study. The templates provide in depth directions on what to supply. Generally, this is where we disclose to participants the details of the rest of the protocol. What we’re researching, why, how, and why they were invited to participate?
Of the remaining fields the only one we’re likely interested in is the data management questions. The majority of the others cover either clinical research or biological research or other things that we hopefully won’t touch on very much in human computer interaction. Unless the field has changed a lot by the time you’re listening to this. Nonetheless though you should actually peek into the items to make sure that they don’t apply to you if you’re filling out a protocol like this. Under the Data Management section, we’ll want to describe how we keep participants data safe. That’ll include descriptions of the way we store it and define information about participants. And how we’ll safeguard the data itself through password protection or encryption or anything like that. Finally, there are always some questions that we answer for all studies even though they generally won’t apply to us. Generally our studies won’t involve the Department of Defense, generally they shouldn’t involve Radiation. And one day I really kind of hope they involve Nanotechnology but we’re probably not quite there yet. So we’d mark no that there is no DoD involvement. Finally, at the bottom, there’s a place to upload some additional documents. This is where we would supply things like an interview script, a survey, a recruitment script and other documents that the IRB would want to see and approve. When we’re done, we can click Save and Continue Application. On the next page, we can preview everything on one flat screen, and then check off at the end that we have no conflicts of interest, or report them, if we do. Then we’ll click Save & Continue again. And for y’all, you would then submit it to your primary investigator. I am the primary investigator so I see something a little bit different here than what you would see. After submission, we’ll generally hear back from IRB in about three weeks about whether the study was accepted and what changes need to be made if not.
Institutional review boards govern any research institutions that receive support from the federal government. But what about research that doesn’t receive any federal support? Very often, companies will do research on their users. This is especially common in HCI. Lots of companies are constantly doing very interesting testing on their users with a lot of rapid AB experiments. There’s a lot of potential knowledge there, but at the same time much of what they do likely would not pass IRB if it were university research. This actually came up recently with Facebook in a paper they published titled experimental evidence of massive-scale emotional contagion through social networks. Basically, Facebook wanted to see if they could predict what would make users happy or sad. And as a result they tweaked the news feed for some users to test out their ideas. In other words, they tried to manipulate their user’s mood for experimental purposes. Now, Facebook argues that this was consistent with their own data use policy, which permits them to perform experiments like this. Some social scientists however would argue that this does not constitute informed consent. Informed consent they say is specific to a certain experiment, temporary for a known period of time and given without coercion. Some would argue that if you don’t agree you can’t use Facebook qualifies as coercion. These are some difficult issues and if you end up working in HCI industry, you’ll likely find yourself wrestling with some of them.
People are still discussing whether or not Facebook’s study on its impact on users’ moods was ethical. Facebook maintains that the study was consistent with its own data use policy, which constitutes informed consent. Opponents argue that it doesn’t. What do you think? If you think that this was ethical, why do you think it was ethical? If you think that it was unethical, what could’ve made it ethical?
If you said yes, there are several reasons you might have stated. You might agree that because the terms of service covered it, it was technically ethical research. The users did agree to things like this. You may have actually read the article or read other publications about it and noted that Facebook actually has an internal IRB that reviews things like this. And in this case, an external IRB did review the study. If you said no, the reason you gave may have been that we know users are not aware of what’s in terms of use. We have plenty of studies that indicate that users really don’t spend any time reading what they’re agreeing to. And while technically, it’s true that they’re still agreeing to it, what we’re interested in here are participants’ rights. If we know that users aren’t reading what they’re agreeing to, don’t we have an ethical obligation to make sure they’re aware before we go ahead with it. We also might say no because users couldn’t opt out of this study. Part of that is because opting out of the study alone means deactivating your entire Facebook account or just stopping using the tool. But part of it is that users also weren’t aware that a study was now going on. They couldn’t opt out of the study specifically, nor could they even opt out of it by closing down their entire Facebook account because they didn’t know when the study had started. That ties into the other issue. Users weren’t notified that they were participants in an experiment. So even though they technically agreed to it when they agreed to Facebook’s terms of service, one could argue the fact they weren’t notified when the study was beginning and ending means that it wasn’t ethical research. I’m not going to give you a right or wrong answer to this. There’s a very interesting conversation to have about this. But what’s most important here are the interesting questions that it brings up. Especially in regard to companies doing human subjects research that doesn’t have any over sight from the federal government. If you agreed with these reasons why it wasn’t ethical, what could they have done to fix it? Maybe they could have separated out the consent process for research studies from the rest of Facebook as a whole. Maybe they could have specifically requested that individual users opt-in, and alert them when the study was done, but not tell them what’s actually being manipulated. And even if the original study was ethical, there were likely things that could have reduced the backlash. At the same time, those things might have affected the results. These are the tradeoffs that we deal with.
In a recent paper, in The Washington and Lee Law Review, Molly Jackman and Lauri Kanerva, two Facebook employees explored exactly this issue. Jackman and Kanerva specifically note that the ethical guidelines developed in the context of academia, do not always address some of the considerations of industry. In response, the authors directly advocate for setting up a set of principles and practices for industry environments. In other words, rather than just ignoring the parts that aren’t relevant for industry, the authors advise creating a new set of standards specifically for industry. To do so, Facebook designed its own internal review process. In this case, Facebook’s process is heavily reliant on deferring to a research area expert. I don’t say defer in a bad way. A key part of IRB is that experiments are reviewed by people with no incentive to permit the study if it isn’t ethical. Facebook tries to replicate this by referring studies to an external reviewer who in turn decides whether or not an additional review, and even external IRB is necessary. The other thing that’s important to note though, is that Facebook isn’t under any strong obligation to do this. Universities that receive federal funding are governed by the Belmont Report, but companies are not yet governed by any similar law. So we rely on companies to govern themselves. In Facebook’s case, it seems to be going pretty well, but you might find yourself at a company that doesn’t have such a program and you’ll have to apply these standards yourself.
In this lesson, we’ve talked about research ethics. Research ethics guide the human subjects research we do to make sure we’re respecting the rights of our participants, but it also makes sure the data we’re gathering is good and useful. At every stage of our design life cycle, we want to keep respect for our participants at the forefront of our thoughts. That means being wary of experimenting in ways that might negatively affect users. That also means only asking users to dedicate their time to evaluating interfaces that are well thought out, and that means respecting user’s viewpoints and position in the design process.
The first stage of the design lifecycle is need-finding or requirements gathering. This is the stage where we go and try to find out what the user really needs. The biggest mistake that the designer can make is jumping to the design process before understanding the user or the task. We want to develop a deep understanding of the task they’re trying to accomplish and why. As we do this, it’s important to try to come in with as few preconceived notions as possible. There’s an old adage that says, “When all you have is a hammer, everything looks like a nail.” This is similar. If you come in having already decided what approach you want to take, it’s tempting to only see the problem in terms of the approach you’ve chosen. So, we’re going to go through a process that attempts to avoid as many preconceived notions as possible. We’re going to start by defining some general questions we want to answer throughout the data gathering process about who the user is, what they’re doing, and what they need. Then, we’ll go through several methods of generating answers to those questions to gain a better understanding of the user. Then, we’ll talk about how to formalize the data we gather into a shareable model of the task and a list of requirements for our ultimate interface. Note that each of these tools could get a lesson on its own on how to do it. So, we’ll try to provide some additional resources to read further on the tools you choose to use.
Before we start our need-finding exercises, we also want to enter with some understanding of the data we want to gather. These are the questions we ultimately want to answer. That’s not to say we should be answering them every step of the way, but rather, we want to gather the data necessary to come to a conclusion at the end. Now, there are lots of inventories of the types of data you could gather, but here’s one useful list. One, who are the users? What are their ages, genders, levels of expertise? Two, where are the users? What is there environment? Number three, what is the context of the task? What else is competing for users’ attention? Four, what are their goals? What are they trying to accomplish? Five, right now, what do they need? What are the physical objects? What information do they need? What collaborators do they need? Six, what are their tasks? What are they doing physically, cognitively, socially? And seven, what are the subtasks? How do they accomplish those subtasks? When you’re designing your need finding methods, each thing you do should match up with one or more of these questions.
In order to do some real need finding, the first thing we need to do is identify the problem space. Where is the task occurring, what else is going on, what are the user’s explicit and implicit needs? We’ll talk about some of the methods for doing that in this lesson, but before we get into those methods, we want to understand the scope of the space we’re looking at. So consider the difference between these two actions. [MUSIC] Notice that in each of these, I’m doing the same task, turning off the alarm. But in the first scene we’re focusing very narrowly on the interaction between the user and the interface. In the latter, we’re taking into consideration a broader view of the problem space. We could zoom out even further if we wanted to and ask questions about Where and why people need alarm systems in the first place. That might lead us to designing things like security systems for dorm rooms or checking systems for office buildings. As we’re going about need finding, we want to make sure we’re taking the broad approach. Understanding the entire problem space in which we’re interested, not just focusing narrowly on the user’s interaction with a particular interface. So in our exploration of methods for need finding, we’re going to start with the most authentic types of general observation, then move through progressively more targeted types of need finding.
Just as we want to get an idea of the physical space of the problem, we also want to get an idea of the space of the user. In other words, we want to understand who we’re designing for. That comes up a lot when doing design alternatives and prototyping, but we also want to make sure to gather information about the full range of users for whom we’re designing. So, let’s take the example of designing an audiobook app for people that exercise. Am I interested in audiobooks just for kids or for adults too? Am I interested in experts who are exercising or novices at it? Am I interested in experts listening to audiobooks? Or am I interested in novices at that as well? Those are pretty key questions. They differentiate whether I’m designing for business people who want to be able to exercise while reading or exercisers who want to be able to do something else while exercising. The task is similar for both, but the audience, their motivations, and their needs are different. So, I need to identify these different types of users and perform need-finding exercises on all of them. One of the most successful products of all times exceeded because of the tension to user types. The Sony Walkman became such a dramatic success because they identified different needs for different types of people, design their product in a way that it met all those needs, but then they marketed specifically to those different types of individuals. You can read more about that in a book called Doing Cultural Studies by Hugh Mackay and Linda Janes.
During need-finding, there are some significant considerations that need to be made to avoid biasing your results. Let’s go through five of these possible biases. Number one, confirmation bias. Confirmation bias is the phenomenon where we see what we want to see. We enter with some preconceived ideas of what we’ll see and we only notice the things that confirm our prior beliefs. Try to avoid this by specifically looking for signs that you’re wrong, by testing your beliefs empirically, and by involving multiple individuals in the need-finding process. Number two, observer bias. When we’re interacting directly with users, we may subconsciously bias them. We might be more helpful, for example, with users using the interface that we designed compared to the ones that other people designed. On surveys, we might accidentally phrase questions in a way that elicits the answers that we want to hear. Try to avoid this by separating experimenters with motives from the participants, by heavily scripting interactions with users, and by having someone else review your interview scripts and your surveys for leading questions. Number three, social desirability bias. People tend to be nice, people want to help. If you’re testing an interface and the participants know that you’re the designer of the interface, they’ll want to say something nice about it to make you happy, but that gets in the way of getting good data. Try to avoid this by hiding what the socially desirable response is by conducting more naturalistic observations and by recording objective data. Number four, voluntary response bias. Studies have shown that people with stronger opinions are more likely to respond to optional surveys. You can see this often in online store reviews. The most common responses are often fives and ones. For us, that means if we prefer quantitative analysis on surveys, we risk oversampling the more extreme views. Avoid this by limiting how much of the survey content is shown to users before they begin survey, and by confirming any conclusions with other methods. Number five, Recall bias. Studies have also shown that people aren’t always very good at recalling what they did, what they thought or how they felt during an activity they completed in the past. That can lead to misleading and incorrect data. Try to avoid this by studying casks and contexts by having users think out loud during activities or conducting interviews during the activity itself. Now, these biases can be largely controlled also by making sure to engage in multiple forms of need-finding.
For certain tasks, a great way for us to understand the users need is to simply watch. A great way for me to start understanding what it’s like to need an audiobook app for exercising is to come somewhere where people are exercising and just watch them exercise. This is called naturalistic observation, observing people in their natural context. So I’m fortunate that I actually live across the street from a park, so I can sit here in my rocking chair on my porch and just watch people exercising. Now, I want to start with very specific observations and then generalize out to more abstract tasks. That way I’ll observe something called confirmation bias which is basically when you see what you want to see, so what do I notice? Well, I notice that there’s a lot of different types of exercisers. There are walkers, joggers, runners I see some rollerbladers, I see some people doing yoga. I see a lot of people riding bikes but the bikers seem to be broken into two different kinds of groups. I see a lot of people biking kind of leisurely but I also see some bikers who are a little bit more strenuous about it. I’m also noticing that while joggers might be able to stop and start pretty quickly, that’s harder for someone riding a bike. So I might want to avoid designs that force the user to pull out their phone a lot because that’s going to be dangerous and awkward for people riding bikes. Now I also see people exercising in groups and also people exercising individually. For those people exercising in groups, I don’t actually know if they’d be interested in this. Listening to something might kind of defeat the purpose of exercising together. So I’m going to have to note that down as a question I want to ask people later. I also see that many people tend to stretch before and after exercising and I’m wondering if we can use that. Then we can have some kind of starting and ending sequence for this, so that a single session is kind of book ending by both stretching, and interacting with our app. Note that by just watching people engage in the task of exercising, I’m gathering an enormous amount of information that might affect my design. But note also, that while naturalistic observation is great, I’m limited ethically in what I can do. I can’t interact with users directly and I can’t capture identifying information like videos and photographs that’s why I can’t show you what I’m seeing out here. I’m also limited in that I don’t know anything about what those users are thinking. I don’t know if the people working out in groups would want to be able to listen to audiobooks while they’re doing yoga. I don’t know if bluetooth headsets would be problematic for people riding bike, I need to do a lot more before I get to the design phase. But this has been very informative in my understanding of the problem space and giving me things I can ask people later on.
Here are five quick tips for doing naturalistic observation. Number one, take notes. Don’t just sit around watching for a while. Be prepared to gather targeted information and observations about what you see. Number two, start specific, and then abstract. Right down the individual little actions you see people doing before trying to interpret or summarize them. If you jump to summarizing too soon, you risk tunnel vision. Number three, spread out your sessions. Rather than sitting somewhere for two hours, one day and then moving on, try to observe in shorter 10 to 15 minute sessions several times. You may find interesting different information, and your growing understanding and reflection on past exercises will inform your future sessions. Number four, find the partner. Observe together with someone else. Take your own notes, and then compare them later. So, you can see if you all interpreted the same scenarios or actions in the same way. Number five, look for questions. Naturalistic observation should inform the questions you decide to ask participants in more targeted need-finding exercises. You don’t need to have all the answers based on observation alone. What you need is questions to investigate further.
Sometimes it’s not just enough to watch people engaging in a task. Sometimes we want to experience a task for ourselves. So that’s what I’m going to do. I listen to audiobooks a lot. I don’t really exercise. I should, but I don’t. But I’m going to try this out. So I’ve got my audiobook queued up, I’ve got my mic on so I can take notes as I run. So I’m going to go on a jog and see what I discover. So what did I learn? I learned that I’m out of shape for one thing. I already knew that but I learned it again. I also learned that this app would be very useful for anyone doing participant observation on exercisers. Because I kept having to stop to record notes for myself, which I could have done with this app that I’m trying to implement. But aside from that, I noticed that unexpected things happen pretty often that made me wish that I could easily go back in my book. Or sometimes there are just things I just wanted to hear again, but there was no easy way to do that. I also notice that there’s definitely the need there for me. I already planned to listen to everything again now that I’m home because there were notes I wanted to take that I couldn’t take easily. I also noticed that while sometimes I wanted to take notes, sometimes I also just want to leave a bookmark. Now we do have to be careful here though. Remember you are not your user. When you’re working as a participant observer, you can avail useful insights, but you shouldn’t over represent your own experiences. You should use this experience as a participant observer to inform what you ask users going forward
Let’s zoom in a little bit more on what the user actually does or we can do naturalistic and participant observation without having to directly interact much with our users. We need to get inside users heads a little more to understand what they’re thinking and doing. If you’re trying to design interfaces to make existing tasks easier, one way to research that is to look at the hacks that users presently employ. How do they use interfaces in non-intended ways to accomplish tasks or how do they break out of the interface to accomplish a task that could have been accomplished with an interface? If you’re designing a task meant to be performed at a desk like this, looking at the person’s workspace can be a great way of accomplishing this. So for example, I have six monitors around. And yet, you still see Post-It notes on my computer. How could I possibly need more screen real estate? Well, Post-It notes can’t be covered up. They don’t take away from the existing screen real estate. They’re visible even when the computer is off. So, this implicit notes here is the way to hack around the limitations of the computer interface. Now when you’re looking at hacks, it’s important to not just look at what the user does and assume you understand why. Look at their work around and ask them why they’re using them. Find out why they don’t just use the interface that’s currently in place. You might find they just don’t know about them, which presents a different kind of design challenge. Now hacks are related to another method we can use to uncover user needs as well, which are called errors. Whereas hacks are ways users get around the interface to accomplish their tasks, errors are slips or mistakes that users frequently make while performing the task within the interface.
When we’re trying to make iterative improvements, one of the best places we can look is at the errors users make with the tools that they currently have available. We can fix those errors, but we can also use those errors to understand a bit more about the user’s mental model. So, here’s a common example of an error for me, which is slip. I keep my email open in the window on the left. I frequently forget that it’s my active window while I’m trying to type into one of the other windows, and as a result, I’ll hit a bunch of hotkeys in my email interface. I’ll tag random emails, delete random emails. It’s just kind of a mess. Now, this is a slip because there’s nothing wrong with my mental model of how this works. I understand there’s an active window and it’s not selected. The problem is that I can easily forget which window is active. Mistakes, on the other hand, are places where my mental model is weak, and for me, a place where that happens is when I’m using my Mac. I’m used to PC where the maximize button always makes a window take up the entire screen. I’ve honestly never fully understood the maximize button on a Mac. Sometimes, it seems to work like a PC maximize button. Sometimes, it just expands the window a bit, but not to the entire screen. Sometimes, it enters even like a full-screen mode hiding the top taskbar. I make mistakes there because I don’t have a strong mental model of how it works. So, if you are watching me, you could see me making these errors and you could ask me why I’m making them. Why did I choose to do that if that was my goal? That works for both discovering hacks and discovering errors. Watch people performing their tasks and ask them about why certain things happened the way that they do. Discovering hacks and errors involves a little bit more user interaction than just watching people out in the wild. So, how about we do that if we’re doing something like creating an app that people are going to use in public? Well, maybe we actually go up to people we see exercising out in public. We can actually get approval to do that, but that’s going to be a little bit awkward and the data we get might not always be great. So, at this point, we might be better off recruiting people to come in and describe their experiences. People experience hacks and errors pretty consciously. So, our best bet would likely be to target local exercise groups or local areas where exercisers frequent, and recruit people to come in for a short study. Or maybe, we could recruit people to participate in a study during their normal exercise routine, taking notes on their experience or talking us through their thought process. We can actually take that to an extreme and actually adapt something like an apprenticeship approach where we actually train to become users.
If we’re designing interfaces for particularly complex tasks, we might quickly find out that just talking to our participants or observing them really isn’t enough to get the understanding we need to design those interfaces. For particularly complex tasks, we might need to become experts ourselves in order to design those programs. This is informed by the domain of ethnography, which recommends researching a community or a job or anything like that, by becoming a participant in it. It goes beyond just participant observation though, it’s really about integrating oneself into that area and becoming an expert in it and learning about it as you go. So we bring in our expertise and design in HCI and use that combined with the expertise that we develop to create new interfaces for those people. So for example, our video editors here at Udacity have an incredible incredibly complex workflow involving multiple programs, multiple workflows, lots of different people and lots of moving parts. There’s no possible way I could ever sit down with someone for just an hour and get a good enough picture of what they do, to design a new interface that will help them out, I really need to train under them. I really need to become an expert at video editing and recording myself, in order to help them out. It’s kind of like an apprenticeship approach. They would apprentice me in their field and I would use the knowledge that I gain to design new interfaces to help them out. So ethnography and apprenticeship are huge fields of research both on their own and as they apply to HCI. So if you’re interested in using that approach take a look at some of the resources that we’re providing.
A most targeted way of gather information from users though is just to talk to them. One way of doing that might be to bring them in for an interview. So I’m sitting here with Morgan, who’s one of the potential users for our audio book app targeted at exercisers. And we’re especially interested in the kinds of task you perform while exercising and listening to audio books at the same time. So to start, what kind of challenges do you run into doing these two things at once? » I think the biggest challenge is that it’s hard to control it. I have headphones that have a button on them that can pause it and play. But if I want to do anything else I have to stop, pull up my phone and unlock it just to rewind. » Yeah, that makes sense. Thank you. Interviews are useful ways to get at with the user is thinking when they’re engaging in a task. You can do interviews one on one like this or you can even do interviews in a group with multiple users at the same time. Those tend to take on the form of focus groups, where a number of people are all talking together about some topic, and you can use them to tease out different kinds of information. Focus groups can elicit some information we don’t get from this kind of an interview, but they also present the risk of overly convergent thinking. People tend to kind of agree with other instead of bringing in new ideas. So they should really be used in conjunction with interviews, as well as other need finding techniques.
Here are five quick tips for conducting effective interviews. Now, we recommend reading more about this before you actually start interviewing people, but these should get you started. Number one, focus on the six W’s when you’re writing your questions, who, what, where, when, why, and how. Try to avoid questions that lend themselves to one word or yes or no answers, those are better gathered via surveys. Use your interview questions to ask open-ended semi-structured questions. Number two, be aware of bias. Look at how you’re phrasing your questions and interactions and make sure you’re not predisposing the participant to certain views. If you only smile when they say what you want them to say, for example, your risk biasing them to agree with you. Number three, listen. Many novice interviewers get caught up in having a conversation with a participant rather than gathering data from the participant. Make sure the participant is doing the vast majority of the talking and don’t reveal anything that might predispose them to agree with you. Number four, organize the interview. Make sure to have an introduction phase, some lighter questions to build trust, and a summary at the end, so the user understands the purpose of the questions. Be ready to push the interview forward or pull it back on track. Number five, practice. Practice your questions on friends, family, or research partners in advance. Rehearse the entire interview. Gathering subjects is tough, so when you actually have them, you want to make sure to get the most out of them.
Interviews are likely to be one of the most common ways you gather data. So let’s run through some good and bad interview questions real quick. So here are six questions. Which of these would make good interview questions? Mark the ones that would be good. For the ones that would be bad, briefly brainstorm a way to rewrite the question to make it better. You can go ahead and skip forward to the exercise if you don’t want to listen to me read them out. Number one, do you exercise? Number two, how often do you exercise? Number three, do you exercise for health or for pleasure? Number four, what, if anything do you listen to while exercising? Number five, what device do you use to listen to something while exercising? Number six, we’re developing an app for listening to audio books while exercising. Would that be interesting to you?
Personally, I think three of these are good questions. Do you exercise, is not a great question, because it’s kind of a yes or no question. How often do you exercise, is actually the better way of asking the same question. It’s subsumes all the answers to do you exercise, but leaves more room for elaboration or more room for detail. Do you exercise for health or for pleasure, is not a great question, because it presents to the user a dichotomy. It might not be the way they actually think about the problem. Maybe there’s some other reason they exercise. Maybe they do it to be social, for example. We want to leave open all the possibilities a user might have. So instead of asking, do you exercise for health or for pleasure, we probably want to ask, why do you exercise? The next two questions work pretty well, because they leave plenty of room for the participant to have a wide range of answers, and they’re not leading them towards any particular answer. We’re not asking, for example, what smartphone do you use to listen to something, because maybe they don’t use a smartphone. This sixth one is interesting. We’re developing an app for listening to audiobooks while exercising. Would that be interesting to you? What’s wrong with that question? When we say, we’re developing an app, we introduce something called social desirability bias. Because we’re the ones developing the app, the user is going to feel some pressure to agree with us, to support our ideas. People like to support one another. And so even if they wouldn’t be interested, they’ll likely say that they would, because that’s the supportive thing to say. No one wants to say, hey, great idea, David, but I would never use it. So what we want to make sure to do is create no incentive for a user to not give us the complete, honest answer. Worrying about hurting our feelings is one reason why they wouldn’t be totally honest. So we might reword this question just to say, would you be interested in an app for listening to audiobooks while exercising? Now granted, the fact that we’re the ones asking still probably will tip off the user that we’re probably thinking about moving in that direction, but at least it’s going to be a little more collaborative. We’re not tipping them off that we’re already planning to do this, we’re telling them that we might be thinking about doing it. And so if they don’t think it’s a good idea, they kind of feel like they should tell us right now, to save us time down the road. So by rephrasing the question that way, we hopefully, avoid biasing the participant to just agree with us to be nice.
Think-aloud protocols are similar to interviews in that we’re asking users to talk about their perceptions of the task. But with think-aloud, we’re asking them to actually do so in the context of the task. So instead of bringing Morgan in to answer some questions about listening to audiobooks while exercising, I’ll ask her to actually think out loud while listening to audiobooks and exercising. If this was a different task like something on a computer, I could have her just come into my lab and work on it. But since this is out in the world, what I might just do is give her a voice recorder to record her thoughts while she’s out running and listening. Now think aloud is very useful, because it can help us get at users thoughts that they forget when they are no longer engaged in the task. But it’s also a bit dangerous by asking people to think aloud about their task, we encourage them to think about it more deliberately and that can change the way they actually act. So while it’s useful to get an understanding of what they are thinking, we should check to see if there are places where what they do differs when thinking out loud about it. We can do that with what’s called a post-event protocol, which is largely the same, except we wait to get the user’s thoughts until immediately after the activity. That way, the activity is still fresh in their minds, but the act of thinking about it shouldn’t affect their performance quite as much.
Most of the other methods for need finding, like observation, interviewing, apprenticeship, require a significant amount of effort for what is often relatively little data. Or it’s data from a small number of users. We might spend an entire hour interviewing a single possible user or an hour observing a small number of users in the world. The data we get from those interactions is deep and thorough, but sometimes, we also want broader data. Sometimes, we just want to know how many people encounter a certain difficulty or engage in a certain task. If we’re designing an audio book app for exercisers, for example. Maybe we just want to know how often those people exercises or maybe we want to know what kind of books they listen to. At that point, a survey might be our more appropriate means of need finding. Surveys let us get a much larger number of responses very quickly and the questions can be phrased objectively, allowing for quicker interpretation. And plus, with the Internet, they can be administered asynchronously for at a pretty low cost. A few weeks ago, for example, I came up with the idea for a study on Friday morning. And with the great cooperation from our local IRB office, I was able to send out the survey to potential participants less than 24 hours later and receive 150 responses within a week. Now of course, the data I receive from that isn’t nearly as thorough as what I would receive from interviewing some of those participants. But it’s a powerful way of getting a larger amount of data. And it can be especially useful to decide what to ask participants during interviews or during focus groups.
Survey design is a well documented art form. In fact, designing surveys is very similar to designing interfaces themselves. So, many of the lessons we’ve learned in our conversations apply here as well. Here are five quick tips for designing and administering effective surveys. Number one, less is more. The biggest mistake that I see novice survey designers make is to ask way too much. That affects the response rate and the reliability of the data. Ask the minimum number of questions necessary to get the data that you need, and only ask questions that you know that you’ll use. Number two, be aware of bias. Look at how you’re phrasing the questions. Are there positive or negative connotations? Are participants implicitly pressured to answer one way or the other? Number three, tie them to the inventory. Make sure every question on your survey connects to some of the data that you want to gather. Start with the goals for the survey and write the questions from there. Number four, test it out. Before sending it to real participants, have your co-workers or colleagues test out your survey. Pretend they’re real users and see if you would get the data you need from their responses. Number five, iterate. Survey design is like interface design. Test out your survey, see what works and what doesn’t, and revise it accordingly. Give participants a chance to give feedback on the survey itself so that you can improve it for future iterations.
Surveys are used often in HCI because of their convenience, but they’re only useful if the questions are actually well-written. Tips like “Be aware of bias” and “Test it out” are good pieces of general advice, but there are also lots of specific things that we can do to make our survey questions better. So, in fact, there are six things I personally recommend in survey design: be clear, be concise, be specific, be expressive, be unbiased, and be usable. Let’s go through what these actually mean in practice. Be clear means we want to make sure the user actually understands what we’re asking. So, if we’re using a numeric scale, for example, we don’t want to just give them numbers. We want to actually code those numbers with what they mean. It’s not uncommon to see some larger scales code only the first, last, and middle number, but it’s always better to assign some kind of label to every single number, and make sure they’re parallel. We wouldn’t want some thing like highly dissatisfied, dissatisfied, neutral, a little satisfied, and satisfied. We also want to avoid overlapping ranges if we’re asking about some range of numbers. So, here we’re asking, “How many times per week do you watch Hulu?” So, if a user says they generally watch twice per week, it’s not clear whether they would choose 0-2 or 2-5. Instead, we want to make sure the ranges don’t overlap. If we’re in doubt on whether the user will actually understand our question, we should provide some extra detail. For example, if we were asking, “Do you own a tablet computer?” We might infer that not all our users really understand what a tablet computer is. So, we’d go on to define and say it’s a computer with a touchscreen and detachable keyboard. That improves the likelihood that the user actually understands what we’re asking. If we’re asking about a frequency, it’s useful to timebox it. So, for example, if we ask, “How often do you exercise?” Users might not fully understand what the difference between rarely and occasionally is, for example. Is rarely once a week, once a month, once a year? Is frequently everyday, five times a week? So, instead, we probably want to ask a question like, “In the past seven days, how many times have you done this behavior?” That’s a much more objective question and a lot easier to answer. Second, we want to be concise with our questions. We always want to make sure to ask our questions in plain language that the user can understand. So, for example, instead of asking something like, “What was the overall level of cleanliness that you observed within the car that you rented?” We’d ask, “How clean was the car?” Now, it is worth noting that sometimes being concise and being clear are at odds. Adding more detail inherently means being less concise. So, it’s a trade-off. Use your best judgment to decide when adding more detail will be worth the trade-off. Third, when asking questions, we want to be specific. We want to avoid questions that are on super-big ideas. For example, “How satisfied were you with the interface?” Well, there’s a lot of elements of satisfaction with using an interface. Asking about satisfaction with the interface as a whole is such a big question that it’s hard to answer. Instead, we might ask a series of smaller questions like how satisfied were you with how quickly the interface responded to your commands or how satisfied were you with how easily you could find the command you were looking for. Part of this is avoiding what are called double-barrel questions. A double-barrel question is a question that asks about two things at the same time. So, for example, if we were asked, “How satisfied are you with the speed and availability of your mobile connection? What if a user were satisfied with the availability, but not satisfied with the speed?” How did they answer that question? So, instead, we will break this up into two questions, one asking about speed, one asking about availability. We also want to avoid questions that allows some internal conflict. This is similar to avoiding questions about big ideas. For example, how satisfied were you with your food? Well, I might have been satisfied with the taste of it, but not with the temperature of it or not the appearance of it. So, instead, we break that down into smaller questions that each address each individual component of satisfaction. Four, we want to be expressive or really what they should say is allow the user to be expressive, but that would break my nice little symmetry over here. We want to make sure to emphasize the user’s opinions. Sometimes users taking our surveys are hesitant to be very emphatic or very critical. So, we want to make sure to emphasize in the questions that we’re looking for their opinions. Instead of asking, “Is our subscription price too high?” We might ask, “Do you feel our subscription price is too high, too low, or about right?” In the second version, a user could say too high without feeling like they’re being very combative. Whenever possible, we want to use ranges instead of yes and no questions. That allows the user to express more of the details about their individual answers. So, instead of asking, “Do you use social media? Yes or no?” We might ask, “In the past seven days, how much time have you spent on social media?” This allows the user to express something more closely resembling the complexity of their answer. If we’re asking about something with levels of frequency or levels of agreement, we want to give lots of levels. Simply saying how satisfied are you, dissatisfied, or satisfied isn’t enough to capture the full range of user opinions. I generally recommend always using at least five, so you can differentiate people who are highly satisfied, which means I have no complaints from people who are satisfied, which means I might have some complaints, but overall, it’s a positive experience. That’s actually a pretty useful distinction to arrive at. When possible, it’s also useful to allow users to make multiple selections. For example, imagine we were asking, “What social media platform do you use the most?” Then we’re losing something with those users who think they use multiple platforms with equal frequency. So, instead, why not let them choose more than one? There might be some good reason why we want them to choose only one, maybe some follow-up questions are based on that, but a lot of times it may also be beneficial to allow them to select multiple answers. For questions that are nominal or categorical, it’s often good to let them add new categories. So, instead of just giving them six to choose from, we could give them six to choose from, but also a box to put in another one. That allows them to express ideas that we didn’t anticipate. My fifth piece of advice is to be unbiased or to avoid bias wherever possible, and that last question is actually a good example of that as well. If we don’t give them that other box, then we’re biasing them with only our pre-established selections. Now, sometimes that’s okay. If we’ve done a lot of surveys in the past and found that these are the only answers anyone ever puts in, then it’s okay to limit the space to only those. Just remember, if you provide users categories and don’t give them another box, then you might be biasing them towards only those opinions that you anticipated. But even if you provide another box, you still risk some bias. So, for example, if you ask, “Why did you choose our service over our competitors?” A user might look at these options and say, “Well, now that you mention it, I guess it was because of your good reputation.” But if you ask them this question without giving them options, they may have given a different answer. It was the act of reading these options that made them think, “Maybe that’s why I did that.” So, often it’s good to actually leave these potentially open-ended questions open. Let them just say in free text why they chose your service. Now, again, if you’ve done the survey for awhile and have a lot of these open-ended responses, and you found that there’s only really four or five answers that users ever put in, then it’s okay to distill those down to options. In that case, you’ve done enough data analysis to understand that these are really the only selections. But if you aren’t yet sure the full space of answers you might receive, it can be better to leave it open-ended. We also need to avoid leading questions. This one is a little bit more obvious. If we’re asking for opinions on our new interface, we don’t want to say something like, “Did our brand new AI-based interface generate better recommendations? Yes or no?” Obviously, here we want the user to choose yes. Instead, we should ask it in a more neutral fashion. “How satisfied were you with the recommendations the interface generated?” Similarly, we want to avoid loaded questions. For example, “In the past seven days, how much time have you wasted on social media?” Asking the question like that is guaranteed to lower our estimates, compared to, “In the past seven days, how much time have you spent on social media?” Finally, my last word of advice is to make your survey usable. Now, a lot of this is actually going to come down to the details of the survey platform that you choose, but some of these are decisions that you can make as well. For example, it’s always good to provide a progress bar that lets the user know how far along in the survey they actually are and adjust their expectations accordingly. It’s not uncommon for users to quit surveys because they don’t know how close they were to the end, even though in reality, they were only a few seconds away from the last question. Along the same line, it’s good to make your page links consistent. If you have a five-page survey, you don’t want one question on the first page, and 50 on the second page, and two on the third page. If a user opens a second page and sees 50 questions, they’re going to naturally assume that the remaining pages also have 50 questions. So, try to make them consistent to set accurate expectations about how long the survey is going to take. Third, order your questions logically. There should be some natural flow to the order in which you ask different questions. You don’t want to go from a demographic question to a satisfaction question, back to a demographic question. You want to gather your questions into topics. Ideally, they should take the user along the thought process that you want them to engage in while answering your questions. Fourth, at the end of the survey, it’s good to alert users about unanswered questions. On the one hand, maybe the user didn’t know they skipped the question. This lets them know, so they can go back and answer. But on the other hand, maybe they skipped that question intentionally, maybe they weren’t comfortable answering, maybe they just don’t have an answer, maybe your space of answer options didn’t capture what they thought. So, you don’t want to force them to go back and answer it, but you also want to account for times when they may have accidentally skipped it. So, let them know, but don’t force them to go back. Finally, preview the survey yourself. This takes some discipline. I have lots of surveys that I never previewed and later found out I use check mark boxes instead of radio buttons for a particular question. So, force yourself to actually preview the survey and fill it out as if you were a real user. Don’t just scroll through it, actually go through and answer each question. So, that was quite a lot of information, but I’m hoping the fact that most of the tips were pretty practical will make it easy to apply. When in doubt, remember, you can always ask for feedback on your survey questions before sending it out to actual participants.
Writing survey questions is an art, as well as a science. So let’s take a look at an intentionally poorly designed survey, and see everything we can find that’s wrong with it. So on the left is a survey. It’s kind of short, mostly because of screen real estate. Write down in the box on the right everything that is wrong with this survey. Feel free to skip forward if you don’t want to listen to me read out the questions. On a scale of 1 to 4 with 1 meaning a lot and Why do you like to exercise? On a scale of 1 to 6 with 1 meaning not at all and Have you listened to an audiobook this year?
Here are a few of the problems that I intentionally put into this survey. Some of them are kind of obvious, but hopefully a couple others were a little bit more subtle and a little bit more interesting. First when I say on a scale of one to four with one meaning a lot and four meaning not at all, what do two and three mean exactly? It’s not a very clear scale to just say the endpoint. Just giving the endpoints doesn’t give a very clear scale. We usually also want to provide an odd number of options, so that users have kind of a neutral central option. Sometimes we’ll want to force our participants to take one side or the other, but generally we want to give them that middle neutral option. Either way though, we definitely don’t want to change the number of options between those two questions. Having one be 1 to 4 and the other be 1 to 6 is just confusing. And even worse, notice that we’re reversing the scale between these two. In the first question, the low number means a lot. In the second question, the high number means a lot. That’s just terrible design. We want to be consistent across our entire survey, both with the direction of our scale and the number of options unless there’s a compelling reason not to. The second question is also guilty of being quite a leading question. Why do you like to exercise assumes the participant likes to exercise. What are they supposed to say if they don’t? And finally, the last question is a yes or no question. Have you listened to an audiobook this year? Yes or no. No is kind of an interesting answer, but yes, I don’t know if you listened to one audiobook this year or a 100 audiobooks this year. I don’t know if you listened every single day or if you just listened once because you had a gift certificate. So we want to reword this question to be a little more open-ended and support a wider range of participant answers.
So far we’ve discussed some of the more common approaches that need finding. Depending on your domain though, there might be some other things you can do. First, if you’re designing for a task for which interfaces already exist, you might start by critiquing the interfaces that already exist using some of the evaluation methods that we’ll cover later in the evaluation lesson. For example, if you’re wanting to design a new system for ordering takeout food, you might evaluate the interfaces of calling in an order, ordering via mobile phone or ordering via a website. Second and similarly, [SOUND] if you’re trying to develop a tool to address a problem that people that are already addressing, you might go look at user reviews and see what people already like and dislike about existing products. For example, there are dozens of alarm clock apps out there, and thousands of reviews. If you want to design a new one, you could start there to find out what people need or what their common complaints are. Third, if you’re working on a task that already involves a lot of automatic logging like web surfing, you could try to get some logs of user interaction that have already been generated. For example, say you wanted to build a browser that’s better at anticipating what the user will want to open next. You could grab datalogs and look for trends both within and across users. You can creative with your data gathering methods. The goal is to use a variety of methods to paint a complete picture of the user’s task.
In this lesson we’ve covered a wide variety of different methods for need finding. Each method has its own disadvantages and advantages. So let’s start to wrap up the lesson by exploring this with an exercise. Here are the methods we’ve covered, and here are the potential advantages. For each row, for each advantage, mark which need-finding method actually has that advantage. Note that these might be somewhat relative, so your answer may differ from ours. Go ahead and skip to the exercise if you don’t want to listen to me read these out. The columns from left to right are Naturalistic Observation, Participant Observation, Errors and Hacks, Interviews, Surveys, Focus Groups, Apprenticeship, and Think-Aloud. The potential advantages are Analyzes data that already exists, Requires no recruitment, Requires no synchronous participation, Investigates the participant’s thoughts, occurs within the task context, and cheaply gathers lots of users’ data.
Here’s my answer to this very complicated exercise. Two methods that analyze data that already exists are Naturalistic Observation and Errors and Hacks. Naturalistic Observation doesn’t necessarily analyze data that already exists, but it analyzes data that’s being produced already on its own without observing it, so we don’t have to go out and create an opportunity for data to happen. We just have to observe it and capture it where it’s already taking place. Errors and Hacks, look at the way users already use interfaces to see what errors they regularly make or when they have to work around the interface. The two methods that require no recruitment are Naturalistic Observation and Participant Observation. In both cases, we don’t need other human participants to come do anything differently based on the fact that we’re doing some research. With interviews, surveys, Focus Groups, apprenticeship and Think-Aloud, we’re always asking users to do something to accommodate us or to give us some data. And with Errors and Hacks, even if we can view that data on our own, we still need the user to give us permission to view their workspace or watch them do whatever they’re doing. There might be some times when you can look for Errors and Hacks with Naturalistic Observation, but generally you need to get enough into the users head to understand why something’s an error or why they need to use a certain hack. For the most part, all of these are going to need some synchronous participation. There might be some exceptions. For example, we could do a retrospective analysis of Errors and Hacks, or we can have someone do a Think-Aloud protocol where they actually write down their thoughts after doing a task. But generally speaking, the way most of these are usually done, they require synchronous participation. Surveys are the exception. Surveys we usually send out to someone, wait some period of time, and get back the results. So we never have to be interacting live with any of our participants. That’s one of the reasons why surveys can get a lot more data than other methods. Adding more participants doesn’t necessarily require more of our time, at least not to gather the data in the first place. Analyzing it might require more time at the end, but that’s not synchronous either. As far as investigating participant thoughts is concerned, almost all these methods can investigate this when used correctly. We could do a survey does not actually investigate participants thoughts, but a well designed survey is going to trying get a heart of the users thinks about things. The only exception is Naturalistic Observation where by definition, we’re just watching people we’re not interacting with them or we’re not asking them what they are thinking. It’s always extremely valuable for us to be able to do some needfinding that occurs within the task context itself. And unfortunately interviews and surveys, which are some of our most common data gathering methods, very often don’t occur within the task context. Naturalistic Observation and Participant Observation obviously do, but since they don’t involved getting inside the real users head, their contributions are a little bit more limited. Apprenticeship and Think-Aloud really capture the benefits of occurring within the task context, because either way we get the user’s thoughts while they’re engaging with the task, or immediately thereafter. It is possible to do interviews and Focus Groups within the task contexts as well, it just isn’t quite as common. Errors and Hacks are certainly debatable as well, because the Errors and Hacks themselves definitely occur within the task context, but our analysis of them usually doesn’t. And finally, as we talk about when we discuss cognitive task analysis, one of the challenges with needfinding is that most of our approaches are extremely expensive. If we want to gather a lot of data cheaply, then we probably need to rely on surveys. Everything else is either going to incur a pretty significant cost or it just isn’t capable of gathering a lot of data. For example, we could cheaply be participant observations for weeks on end, but we’re only ever going to gather data from one person and that’s never ideal.
The needfinding exercises that we’ve gone through so far focus on the needs of the exercisers. What can they do with their hands, what is the environment around them like while exercising, and so on? However, that’s only half the picture for this particular design. Our ultimate goal is to bring the experience of consuming books to people that exercise, which means we also need to understand the task of book-reading on its own. Now a problem space is still around exercisers, so we wouldn’t go through the entire design life cycle for book reading on its own. We don’t need to design or prototype anything for them. But if we’re going to bring the full book reading experience to people while exercising, we need to understand what that is. So take a moment and design an approach to needfinding for people who are reading on their own.
We could apply pretty much every single need-finding method that we’ve discussed to this task. We could, for example, go to the library and just watch people reading and see how they’re taking notes. We’ve all likely done it ourselves. We can reflect on what we do while reading, although again, we need to be careful not to over-value our own priorities and approaches. Reading is common enough, that we can easily find participants for interviews, surveys, think alouds. The challenge here will be deciding who our users really are. Books are ubiquitous. Are we trying to cater to everyone who reads deliberately? If so, we need to sample a wide range of users or initially, we could choose a subset. We might cater to students who are studying or busy business people, or people that specifically walk or bike to work. We might start with one of those groups and then abstract out over time. We might eventually abstract all the way to just anyone who’s unable to read and take notes the traditional way like people driving cars or people with visual impairments but that’s further down the road. The more important thing is that we define who our user is, define the task in which we’re interested, and deliberately design for that user and that task through out the design life cycle.
We’ve noted that design is a life cycle from needfinding to brainstorming design alternatives to prototyping to evaluation. And then, back to needfinding to continue the cycle again. Needfinding on it’s own though can be a cycle by itself. For example, we might use the results of our naturalistic observation to inform the questions we asked during our interviews. For example, imagine that we noticed that very many joggers, jog with only one earphone in. That’s a naturalistic observation, and then in an interview, we might ask, why do some of you jog with only one earphone in? And we might get the answer from the interview that it’s to listen for cars or listen for someone trying to get their attention because they exercise in a busy area. Now that we understand why they have that behavior, maybe we develop a survey to try and see how widespread that behavior is, and ask, how many of you need to worry about what’s around you when you’re listening while driving? If we notice in those surveys a significant split in the number of people who were concerned about that, that might inform our next round of naturalistic observation. We might go out and look and see in what environments are people only wearing one headphone and in what environments are they wearing both. So in that way all of the different kinds of need finding that we do can inform our next round of other kinds of need finding. We can go through entire cycles just of need finding without ever going on to our design alternatives or prototyping stages. However, the prototyping and evaluation that we do will then become another input into this. During our evaluation we might discover things that will then inform what we need to do next as far as need finding is concerned. Creating prototypes and evaluating them gives us data on what works and what doesn’t. And that might inform what we want to observe to better understand the task going forward. That’s the reason why the output of evaluation is more needfinding. It would be a mistake to do one initial needfinding stage, and then jump in to a back and forth cycle of prototyping and evaluation.
During these need-finding exercises, you’ll have gathered an enormous amount of information about your users. Ideally, you’ve combined different sets of these approaches. You’ve observed people performing the tasks, you’ve asked them about their thought process, and you tried it some yourself. Pay special attention to some of the places where the data seem to conflict. Are these cases where you as the designer understand some elements of the task that the users don’t? Or are these cases where your expertise hasn’t quite developed to the point of understanding the task? Once you’ve gone through the data gathering process, it’s time to revisit that inventory of things we wanted to gather initially. One, who are the users? Two, where are the users? Three, what is the context of the task? Four, what are their goals? Five, right now, what do they need? Six, what are their tasks? And seven, what are the subtasks? Revisit these, with the results of your data gathering in mind.
Now that you have some understanding of the user’s needs, it’s time to try to formalize that into something we can use in design. There are a number of different ways we can do this. For example, maybe we create a step-by-step task outline of the user engaging in some task. We can break those tasks down into sub-tasks as well, all the way down to the operator level. We can further develop this kind of task outline into a hierarchical network, like we talked about before. This might involve more complexity than simply a linear series of actions. We might further augment this with a diagram of the structural relationships amongst the components in the system and how they interact. This might give us some information about how we get feedback back to the user or how they interact with our interface in the first place. From there, we might develop as even more into a flowchart equipped with decision-making points or points of interruptions. Notice how these representations are very similar to the outcomes of the task analyses and we talk about in the principles unit of our conversations. We can similarly use the data gathered from here to summarize a more comprehensive task analysis that will be useful in designing and prototyping our designs.
Finally, the final step for need-finding is to define our requirements. These are the requirements that our final interface must meet. They should be specific and evaluatable, and they can include some components that are outside of users tasks, as well, as defined by the project requirements. In terms of user tasks, we might have requirements for guarding functionality, what the interface can actually do, usability, how certain user interactions must work, learnability, how fast the user can start to use the interface, and accessibility, who can use the interface. We might also have some that are generated by external project requirements, like compatibility, what devices the interface can run on, compliance, how the interface protects user privacy, cost, how much the final tool can actually cost, and so on. We’ll use these to evaluate the interfaces we develop, going forward.
How might you need finding work in your chosen area of HCI? If you’re looking at designing for some technological innovation like augmented or virtual interactions, the initial phase might not actually be that different. Your goal is to understand how people perform tasks right now without your interface. So, initially, you want to observe them in their naturalistic environment. Later though you’ll need to start thinking about bringing participants to you to experience the devices firsthand. If you’re interested in something like HCI for healthcare or education, you have a wealth of naturalistic observations available to you. You might even have existing interfaces doing what you want to do and you can try to leverage those as part of your need-finding exercises. Remember, no matter your area of application, you want to start with real users that might be observing them in the wild, talking to them directly, or looking at data they’ve already generated.
Today, we’ve talked about needfinding. Needfinding is how you develop your understanding of the needs of your user, what tasks are they completing, what is the context of those tasks, what else is going on, what are they thinking during the task, and what did they have to hold in working memory. All these things feed into your understanding of your users’ needs. We’ve discussed a number of different techniques to approach this, ranging from low intervention to high intervention. On the low side, we can just observe our users in the wild or we can become users ourselves and participate in the task. Working up, when you try to look at more closely at users areas to find errors or hacks or perused the data that they’re already generating, we might interact with them directly through surveys, interviews, or focus groups, or we might choose to work alongside them, not just participating in the task independently, but learning from them and developing expertise itself. Once you’ve gained the efficient understanding, it’s time to move on to the next step, brainstorming design alternatives.
[MUSIC]. When we’ve developed a good understanding of the needs of our user, it’s time to move on to the second phase of the design life cycle, design alternatives. This is when we start to brainstorm how to accomplish the task we’ve been investigating. The problem here is that design is very hard, it’s hard for a number of reasons. The number of choices we have to make, and things we have to control is more expansive than ever before. Are we designing for desktops, laptops, tablets, smart phones, smart watches, augmented reality, virtual reality, 2D, 3D, gesture input, pen input, keyboard input, mouse input, voice input? In this lesson, we’re going to talk about how to generate ideas for designs. And then we’ll chat about how to explore those ideas a bit further to figure out what you want to actually pursue.
The biggest mistake that a designer can make is jumping straight to designing an interface without understanding the users or understanding the task. The second biggest mistake though is settling on a single design idea or a single genre of design ideas too early. This can take on multiple forms. One form is staying too allegiant to existing designs or products. Take the thermostat, for example, again. If you settled on tweaking the existing design of a thermostat you would never invent the Nest. So if you’re working on improving an existing interface, try to actually distance yourself from the existing solutions, at least initially during the brainstorming session. But this is also a problem if you’re designing interfaces for new tasks as well. Imagine for instance, that while you were observing people exercising, you started sketching interface ideas like how to make the buttons big enough or what buttons need to be featured prominently. In doing so, you’re getting tunnel vision and missing out on any design alternatives that might involve voice or gesture control. So the second biggest mistake you can make is focusing too strongly on one alternative from the very beginning, instead of exploring the entire range of possible design alternatives. The reason why this is such a common mistake, is that there’s this natural tendency to think of it as a waste of time to develop interfaces you’re not going to end up using. You think you can get it done faster just by picking one early on and sticking to it. But flushing out ideas for interfaces you don’t end up using isn’t a waste of time, because by doing so you continue to learn more about the problem. The experience of exploring those ideas that you leave behind will make you a better designer for the ideas that you do choose to pursue. In all likelihood your ultimate design will be some combination of the design alternatives that you explored earlier. So, take my security system for an example. There are two ways of interacting with it, the key pad and the key chain. Two different designs that, in this particular instance, integrated just fine. Different alternatives won’t always integrate side by side this easily, but the design process as a whole is an iterative process of brainstorming, combining, abandoning, revising and improving your ideas, and that requires you start with several ideas in the first place.
When we talk about the problem we’re solving here we define the problem space as disabling a security system as we enter a home. We defined our problem as far as possible away from the current interfaces for doing it. The design space on the other hand is the area in which we design our solutions to this problem. The current design space for this problem is wall mounted devices and portable devices like my key chain. But as we design, the space of possible ideas might expand. For example, as we go along we might be interested in voice interfaces or interfaces with our mobile phones or wearable devices. Our goal during the design alternative phase is to explore the possible design space. We don’t want to narrow down too early by sticking devices on walls or devices on keychains. We want to brainstorm lots of possible approaches, and grow a large space of possible designs.
When you first start brainstorming, your goal is to generate a lot of ideas. These ideas can be very short, very high level, and very general. Your goal is just to generate an expanse of them. They don’t even have to be ideas for interfaces, just any idea for solving the problem. If you look online, you’ll find lots of great guides to how to brainstorm ideas. One of the most interesting takeaways is that research generally indicates it’s better to start with individual brainstorming. That’s not intuitive though. We often hold meetings for brainstorming, but it should start out individually. That’s because brainstorming is most effective when it initially generates a lot of ideas. But groups tend to coalesce around ideas pretty early. So, start out individually. Generate a lot of ideas. Each idea needs only be a few words or a sentence. Don’t worry right now if they’re good or bad. Write down everything. Think about how you design with different types of interactions like gestures and voice and touch. Think about how you design for different interfaces like smartwatches, or tablets, or augmented reality. Think about how you designed for different audiences, novices, and experts or kids and adults. Get silly with it. Some of the best ideas start as silly ideas. How would you design this for your dog or for your cat? How would you design this for someone with three arms or three legs? Go nuts. Your goal is to generate a lot of ideas. These are going to get loaded into your mind, and they’ll crop up in interesting ways throughout the rest of the design process. That’s why it’s important to generate a lot of ideas. You never know when they’ll come up.
So, I’m going to demonstrate this for you real quick. I’m going to brainstorm ideas for our problem of allowing exercisers to consume books and take notes. So, my paper for brainstorming. So, please enjoy this 30-minute video of me sitting here writing at a desk. Here’s my list of ideas. As you might be able to tell it gets kind of crazy, it’s all over the place. You can kind of trace through my entire reasoning process on here. So, some of the ideas are somewhat straightforward. I’ve got voice commands, I’ve got gestures, I’ve got voice transcription. I tried to separate it out into feedback methods and also the way the user actually interacts because we could kind of combine those. Some of these are actually pretty crazy. I’ve got on skin interface. So, I’ve seen some prototypes for things that would let you actually just press on your skin to do different kinds of interactions. I’ve also got augmented reality like Google Glass. I’ve got a portable keyboard like twiddler. So, notice that this is kind of a mess. That’s a good thing. Lists are fine but chances are a lot of your ideas are related to each other. Notice also that I never crumpled up my piece of paper, I never throw it away. I crossed one thing out. That’s because I wrote the wrong word. Really at this stage, you don’t want to reject any ideas. Your goal is just to kind of free-form brainstorm and get all your thoughts out there.
Here are five quick tips for effective individual brain storming. Number one, write down the core problem. Keep this visible. You want to let your mind enter a divergent thinking mode but you also want to remain grounded in the problem. Writing down the problem and keeping it available will help you remain focused while remaining creative. Number two. Constrain yourself. Decide you want at least on idea in a number of different categories. Personally, I try to make sure I have at least three ideas that use nontraditional interaction methods, like touch and voice. You can constrain yourself in strange ways too. Force yourself to think of solutions that are too expensive or not physically possible. The act of thinking in these directions will help you out later. Number three. Aim for 20. Don’t stop until you have twenty ideas. These ideas don’t have to be very well-formed or complex, they can be simply one sentence descriptions of designs you might pursue. This forces you to think through the problem, rather than getting tunnel vision on an early idea. Number 4. Take a break. You don’t need to come up with all of these at once and, in fact, you’ll probably find it’s easier if you leave and come back. I’m not just talking about a ten minute break either. Stop brainstorming and decide to continue a couple days later but be ready to write down new ideas that come to you. Number 5. Divide and conquer. If you’re dealing with a problem like helping kids lead healthier lifestyles, divide it into smaller problems and brainstorm solutions to those. If we’re designing audio books for exercises, for example, we might divide it into things like the ability to take and review notes, or the ability to control playback hands-free. Divide it like that and brainstorm solutions to each individual little problem.
Group brain storming presents some significant issues. Thompson in 2008 laid out four behaviors in group brainstorming that can block progress. The first is social loafing. People often don’t tend to work as hard in groups as they would individually. It’s easy to feel like the responsibility for unproductive brainstorming is shared and deflected. In individual brainstorming, it’s clearly on the individual. The second blocker is conformity, people in groups tend to want to agree. Studies have shown that group brainstorming leads to convergent thinking. The conversation the group has tends to force participants down the same line of thinking, generating fewer and less varied ideas than the individuals acting alone. During brainstorming, though, the goal is diversion thinking, lots of ideas, lots of creativity. The third blocker is production blocking. In group brainstorming, there are often individuals who dominate the conversation and make it difficult for others to actually be heard. Their ideas can thus command more weight, not because of the strength of the ideas, but because of the volume of the description. The fourth blocker is performance matching. People tend to converge in terms of passion and performance, which can lead to a loss of momentum over time. That might be able to get people excited if they’re around other excited people initially, but more often than not, it saps the energy of those who enter with enthusiasm. In addition to these four challenges, I would add a fifth. Group brainstorming may also be prone to power dynamics, or biases. No matter how supportive and collaborative a boss might be, there’ll likely always exist a tacit pressure to build on her suggestions, which dampens creative brainstorming. There also exists considerable literature stating that other biases based on gender, age, race, complain to these group sessions as well. Now note that this doesn’t mean group brainstorming should be avoided altogether. What it means is that we should enter into group brainstorming with strong ideas of how to address these issues, ideally, after a phase of individual brainstorming has already occurred.
To have an effective group brainstorming session, we need to have some rules to govern the individual’s behavior to address those common challenges. In 1957, Osbourne outline four such rules. Number one, Expressiveness. Any idea that comes to mind, share it out loud, no matter how strange. Number two, nonevaluation. No criticizing ideas, no evaluating the ideas themselves yet. Number three, quantity. Brainstorming as many as possible. The more you have, the greater your idea of finding a novel idea. Number four, building. While you shouldn’t criticize other’s ideas, you should absolutely try to build on them. Then, in 1996, Oxley, Dzindolet and Paulus presented four additional rules. Number one, stay focused. Keep the goal in mind at all times. Number two, no explaining ideas. Say the idea and move on. No justifying ideas. Number three, when you hit a roadblock, revisit the problem. Say it again out loud. Number four, encourage others. If someone isn’t speaking up, encourage them to do so. Note that all eight of these rules prescribe what individuals should do, but they’re only effective if every individual does them. So it’s good to cover these rules, post them publicly, and call one another on breaking from them.
The rules given by Osborn, Oxley, Dzindolet and Paulus are about helping individuals understand how they should act in group brainstorming. Here are a few additional tips though that apply less to the individual participants and more to the design of the activity as a whole. Number one, go through every individual idea. Have participants perform individual brainstorming ahead of time and bring ideas to the group brainstorming session, and explicitly make sure to go through each one. That will help avoid converging around an idea too early and make sure everyone is heard. Number two, find the optimal size. Social loafing occurs when there’s a lack of individual responsibility. When you have so many people that not everyone will get to talk anyway, it’s easy for disengagement to occur. I would say group brainstorming session should generally not involve more than five people. If more people need to give perspectives than that, then you can have intermediate groups that then send ideas along to a later group. Number three, set clear rules for communication. Get a 20-second timer and when someone starts talking, start it. Once the timer is up, someone else gets to speak. The goal is to ensure that no one can block others ideas by talking too much whether intentionally or accidentally. Number four, set clear expectations. Enthusiasm starts to wane when people are unsure how long a session will go or what will mark its end. You might set the session to go a certain amount of time or dictate that a certain number of ideas must get generated. No matter how you do it, make sure that people participating can assess where in the brainstorming session they are. Number five, end with ideas, not decisions. It’s tempting to want to leave a brainstorming session with a single idea on which to move forward, but that’s not the goal. Your brainstorming session should end with several ideas, then let them percolate in everyone’s mind before coming back and choosing the ideas to pursue later.
The brainstorming process should lead you to a list of bunch of high level general design alternatives. These are likely just a few words or a sentence each but they described some very general idea of how you might design the interface, to accomplish this task. Your next step is to try to flush these ideas out into three or four ideas that are worth taking forward to the prototyping stage. Some of the ideas you might be able to dismiss pretty quickly, that’s all right. You can’t generate good ideas without generating a lot of ideas. Even though you won’t end up using all of them. In other places, you might explore an idea a little before dismissing it or you might combine two ideas into a new idea. In the rest of this lesson, we’ll give you some thought experiments you can use to evaluate these ideas and decide what to keep, what to combine and what to dismiss.
The first common method we can use to flush out design alternatives is called personas. With personas we create actual characters to represent our users. So let’s create a persona for the problem of helping exercisers take notes while reading books. We’ll start by giving her a name and a face, and then we’ll fill out some details. We want to understand who this persona is. We want to be able to mentally simulate her. We want to be able to say, what would Anika do in this situation? What is she thinking when she’s about to go exercise? What kind of things might interrupt her? We might want to put in some more domain specific information as well. Like, why does this person exercise? When do they exercise? What kind of books do they like? How are they feeling when they’re exercising? Where do they usually exercise? We want to create at least three or four of these different personas, and perhaps more depending on how many different stakeholders we have for our problem. The important thing is that these should be pretty different people, representing different elements of our designs, different elements of our task, so we can explore its entire range. We don’t want to design just for Anika, but we do want to design for real people. And so we should define personas that represent the range of real people that we care about. That way we can ask our questions like, how would Janet feel about this design alternative? Using this, we can start to extort the space and find the options that has the most appeal.
Personas are meant to give us a small number of characters that we can reason about empatheticly. However, it can sometimes also be useful to formulaicly generate a large number of user profiles to explore the full design space. We can do this by defining a number of different variable about our users and the possibilities within each. So here our few examples, we can ask ourselves, do we care about novice users or expert users or both? Do we care about users that read casually, that read seriously, or both kinds of users? Do we only want to cater to users that are highly motivated to use our app, which can make things a little bit easier on us? Or do we want to assume that it won’t take much to stop them from using our app? Can we assume a pretty high-level of technological literacy, or are we trying to cater to more casual users as well? And are we interested in users that are going to use our app all the time, or in users who are going to use our app only occasionally, or both? All of these decisions present some interesting design considerations that we need to keep in mind. For example, for users that are going to use our tool very often, our major consideration is efficiency. We want to make sure they can do what they need to do as quickly as possible. And oftentimes, that might be relying on them to know more about how to use the app. But if we’re designing for users that use our app pretty rarely, we need to make sure to keep all the interactions discoverable and visible. That way every time they come back to the app, it’s like the first time they came back to it. They don’t need to remember anything from the previous time because we don’t know how long it’s been since the last time they’ve used it. If we want to design for both, then we have our work cut out for us. We need to either design very efficient interaction methods that nonetheless are discoverable and visible, or we need to design two sets of interaction methods. One way that’s very discoverable and visible, and one way that’s very efficient. We see this with our example of the hotkeys for copy and paste. If you don’t know how to use them, you have a way of finding them. So it caters to either novice users or users who haven’t used the program in awhile. But because you can also do it with simple hotkeys, it caters to those users who use it more frequently and makes it more efficient for those who are going to be doing it a lot. In deciding what to design, we need to understand what groups, what profiles we’re designing for, and use that to inform our design decisions. Inexperienced designers often make big mistakes here. They either try to design for everybody, which rarely works, or they design with no one in particular in mind. And so, certain areas of program are good for some users, others are good for other types of users. An entire program as a whole is not good for any particular type of user. So it’s very important that we understand the range of users that we’re designing for, and that we make sure the range is actually something that we feasibly can design for.
Building on the idea of a persona we can take that person and stretch her out over time and see what is she thinking, what is she doing at various stages of interacting with our interface or interacting with the task at hand. I’ve also heard this called journey maps although journey maps usually cover much longer periods of time. They cover the entire expanse of the persons life, why they are interested in something and where they are going from here. Timelines can be more narrowed to the specific time which users are interacting with the task or with the program. So our goal is to take that persona and stretched it out overtime. So for our example, what sparks Anika to decide to exercise in the first place? That might be really useful information to know. After she decided to exercise, what did she do next? In theory she doesn’t just start right there. She goes exercise somewhere she has kind of setup process. Then what does she do? In this case, maybe she set ups her audiobooks as she actually pushes play, puts her headphones in, and so on. And then there’s probably a period of actual exercise in our example, and then at the end, she turns off the audiobook. The usefulness of drawing this as a timeline is it starts to let us ask some pretty interesting questions. What prompts this person to actually engage in the task in the first place? What actions lead up to the task? How are they feeling at every stage of the task? And can we use that? How would each design alternative impact their experience throughout this process? For example, if our persona for Anika was that she really doesn’t like to exercise but she knows she really needs to, then we know her mood during this phase might be kind of glum. We need to design our app with the understanding that she might have kind of low motivation to engage in this at all. If our app is a little bit frustrating to use, then it might turn her off of exercising all together. On the other hand, if Anika really likes to exercise, then maybe she’s in a very good mood during this phase. And if she likes exercising on its own, maybe she forgets to even set up the audio book at all. So then we need to design our app with that in mind. We need to design it such that there’s something built into it that could maybe remind her that when she gets to a certain location, she meant to start her audio book. So stretching this out as a timeline lets us explore not only who the user is, but also what they’re thinking and what they’re feeling. And how what we design can integrate with a task that they’re participating in. Exploring our different design alternatives in this way allows us to start to gauge which designs might have the greatest potential to positively impact the user’s experience. And they also let us explore what might be different between different users. Our design might need to be different for Anika who loves to exercise, and Paul who hates exercising. This timeline let’s us start to explore those more personal elements.
We can create general timelines for routine interactions with our design alternatives, but it’s often even more interesting to examine the specific scenarios users will encounter while using our interfaces. Rather than outlining the whole course of interaction, scenarios let us discuss specific kinds of interactions and events that we want to be able to handle. This is sometimes also referred to as storyboards, sequences of diagrams or drawings to outline what happens in a particular scenario. The difference between timelines, and storyboards, and scenarios, is that timelines, in my experience at least, tend to be pretty general. They’re how a routine interaction with the interface or routine interaction with the design alternative goes. Scenarios and storyboards are more specific. They are about a particular person interacting in a particular way with particular events coming up. So, let’s build one of these scenarios. Morgan is out jogging when a fire engine goes by. It’s so loud that she misses about 30 seconds of the book. How did she recover from this? What we want to go through that question with each of our different design alternatives? For our touch interface, she would need to stop, pull out her phone, turn on the screen, and pause the book. For our voice interface, she would have to wait for the fire engine to finish passing and then say, “Rewind.” Because chances are, if it’s too loud for her to hear her book, it’s probably too loud for her phone to hear her voice command. But for a gestural interface, she could simply make the gesture that will allow her to pause the book, and then play again when the fire engine has finished passing. Ideally, we’d like to outline several such scenarios and explore them for various personas and design alternatives. Of course, now, we’re reaching three degrees of freedom, so it’s not crucial that we explore every possible combination of persona, scenario, and design alternative. This is a more fluid process of exploring, what ideas have potential, and what ideas are worth exploring further? We might find there are certain combinations of scenarios and personas that we really care about, that completely rule out certain design alternatives. If we really care about allowing her to exercise in a loud area, that goes ahead and tells us that a voice interface might not be the direction that we want to go in. Or, for another example, if our earlier need-finding exercises revealed a significant number of exercisers carry weights or other things in their hands, then gestural interfaces start to look a lot less promising. Now, as video technology has gotten more and more ubiquitous, storyboards have also taken on a different form in a lot of prototypes that I have seen. There’s been an emergence of what is called video prototyping, which is basically, doing a video mockup of what someone would actually look like interacting with a certain interface to show to people so that they can see, whether or not, that would actually be something that would help them in their everyday life. So, storyboards taken into an extreme, could actually be a video mockup of what it would be like to use the finished product.
In our unit on principles, we talk about task analysis, including things like cognitive task analysis or human information processor models. Performing those analysis as part of our need finding also gives us a nice tool for exploring our design alternatives. Using this, we can start to look at how exactly the goals, operators, methods, and selection rules of the Gomes model map up to the ideas of our design alternatives. How does the user achieve each of their goals in each interface? How relatively easy are the goals to achieve between the different design alternatives? With the results of our cognitive task analysis, we can start to ask some deeper questions about what the user is keeping in mind as well. Given what we know about things competing for our user’s attention, what are the likelihoods that each interface will work? In some ways, this is a similar process to using personas we outlined earlier, but with a subtle difference. Personas are personal and meant to give us an empathetic view of the user experience. User models are more objective and meant to give us a measurable and comparable view of the user experience. So ideally, the result of this kind of analysis is we would be able to say that the different alternatives have these different phases and these different phases have different efficiencies or different speeds associated with them. So, we could start to say exactly how efficient one design is compared to another.
In this lesson we’ve covered several different ways of developing design alternatives. Each method has its advantages and disadvantages. So, let’s start rap the lesson up by exploring this with an exercise. Here are the methods that we’ve covered and here are some other potential advantages. For each row, mark which of the different methods possesses that advantage. Note that these might be somewhat relative, so your answer might differ from ours and that’s perfectly fine.
So personally here’s my answer. For me scenarios are really the only ones that include the entire task context. You can make the case that personas and timelines do as well, but I tend to think that these are a little bit too separated from the task to really include the context. Personas and user profiles, on the other hand, do include the users context. They include plenty of information about who the user is, why they’re doing this task, and what their motivations are. You could make the argument as well, that scenarios and timelines include the user’s context. Because the way we describe them, they’re instances of the personas being stretched out over a scenario or over time. User profile probably do the cleanest job of delineating the target audience. With our personas we have kind of a fuzzy idea of who are different users are. But our user profiles really formally articulate the space of your users in which we’re interested. As far as general workflows that’s what user modeling is really good at. It really outlines the phases or steps or operators that users use in a general sense. You could say the same thing about timelines to a certain extent although timelines are more focused on what the user is thinking and feeling and less on their actual workflow with regard to the task. As far as capturing activity over time, scenarios, timelines, and user modeling all have an element of time in what they analyze. And possibly one of the bigger benefits of using scenarios is they allow us to capture potential edge cases more easily. Timelines, user models, user profiles and personas are all about the general user or the general routine interaction with the task. But scenarios let us pose interesting and novel situations so that we can kind of mentally simulate how our different design alternatives will work with that scenario. For example, we wouldn’t say that a fire engine going by Morgan, which is listening to an audiobook, is a routine thing, so it probably wouldn’t come up in our timeline or in our user modeling. But we can develop a scenario that explores how is she going to deal with that particular event. And while it might seem a little silly to focus so much on edge cases, as we do more and more design, we start to discover that there are a lot of edge cases. They’re all different. But a lot of our time is spent dealing with those unique circumstances that fall pretty far outside the routine interaction with the program.
So let’s apply these techniques to some of the ideas that I came up with earlier. The first thing I might do is go ahead and rule out any of the ideas that are just technologically unfeasible. Coming up with those wasn’t a waste of time because they’re part of a nice broad free flow brainstorming process. But skin based interfaces and augmented reality, probably are not on the table for the immediate future. I might also rule out the options that are unfeasible for some more practical reasons. We might a small team of developers, for example. So, a dedicated wearable device isn’t really our expertise. Now the one I might do next is create some timelines, covering a sequence of events in exercising to use to explore these alternatives further. I might notice that the users I observed and talked with, valued efficiency in getting started. They don’t want to have to walk through a complex set up process every time they start to exercise. I might also use my user persona’s to explore the cognitive load of the users in these different alternatives. They have a lot going on, between monitoring their exercise progress, observing their environment, and listening to the book. So, I’m going to want to keep cognitive load very low. Now granted, we always want to keep cognitive load pretty low, but in this case, the competing tasks are significant enough, that I want to sacrifice features for simplicity, if it keeps that cognitive load pretty manageable. Now based on these timelines and these personas, I would probably end up here with three design alternatives that I want to explore. One is a traditional touch interface, a smartphone app. Then unfortunately it means the user is going to have to pull out their phone whenever they want to take a note. But if I can design it well enough that might not be an issue. I also know that approach gets me a lot of flexibility so it’s good to at least explore it. A second approach is gestural interfaces. I know that people aren’t usually holding their device while exercising. So it would be great if they had someway of interacting without pulling out their phone. Gestures might let us do that. Now in our gesture recognition is in its infancy, but we might be able delivered smart watched technology or something like a Fitbit to support interaction via gestures. A third approach is a voice interface, I know people generally aren’t communicating verbally while exercising, so why not a voice interface? That can even double as the note taking interface. So now that I have three design alternatives that I’m interested in exploring, I would move on to the prototyping stage which is building some version of these that I can test with real users.
Design alternatives are where you explore different ways to facilitate the users task. If you’ve already chosen to focus on a certain area of technology like wearable devices or gestural interaction, then in some ways you’ve put the cart before the horse. You’ve chosen your design before exploring the problem. As a learning experience though, that’s perfectly fine. It’s fine to say, I want to explore augmented reality and I’m going to find a task that lets me do that. You’re still exploring whether or not augmented reality is the right solution for that task. You’re just altering the task if not, instead of altering the design, if not. For other domains, you might need to make sure to create personas for different stakeholders in health care for instance. You would want to make sure any interface you design takes into consideration nurses, doctors, patients, managers, family members, and more. So, do you want to create personas for all those different types of people as well and make sure to explore scenarios that affect each stakeholder.
The goal of the design alternative stage of the design life cycle is to generate lots of ideas and then synthesize those ideas into a handful worth exploring further. So, we started with some heuristics for generating lots and lots of ideas through both individual and group brainstorming. Then, we proposed some methods for exploring those ideas and deciding which ones to pursue. Now, these were all thought experiments. We haven’t actually gotten to the point of designing any of these interfaces yet. That’ll be the next stage. At the end of the design alternative stage, we want to select a handful of designs that are worth carrying forward and prototyping to then give to users for actual feedback.
[MUSIC] So we’ve talked to our users. We’ve gathered some understanding of what they need. We’ve created some ideas for how we might address their need and we’ve mentally simulated those different alternatives. Now it’s time to start actually making things we can put in front of users. [SOUND] This is the prototyping stage. Like brainstorming design alternatives, this involves looking at the different ideas available to us and developing them a bit. But the major distinction is that in prototyping, we want to actually build things we can put in front of users. But that doesn’t mean building the entire interface before we ever even have a user look at it. We want to get user feedback as quickly and rapidly as possible. And build up more sophisticated prototypes over time as we go through the design life cycle. So we’ll start with low fidelity prototypes, things that can be assembled and revised very quickly for rapid feedback from users. Then we’ll work our way towards higher fidelity prototypes, like wire frames or working versions of our interface.
To discuss prototyping there are a variety of different terms and concepts we need to understand. For the most part, these will apply to where in the prototyping timeline those concepts are used. In the early prototyping, we’re doing a very rapid revision on preliminary ideas. This happens on our first few iterations through the design lifecycle. In late prototyping, we’re putting the finishing touches on the final design, or revising a design that’s already live. This happens when we’ve already been through several iterations of our design lifecycle. At the various phases, we’ll generally use different types of prototypes in evaluations. Now, note that everything I’m about to say is pretty general, there will be lots of exceptions. The first concept is representation, what is the prototype? Early on, we might be fine with just some textural descriptions or some simple visuals that we’ve written up on a piece of paper. Later on though, we’ll want to make things more visual and maybe even more interactive. We only want to put the work into developing the more complex type of prototypes once we vetted the ideas with prototypes that are easier to build. So in a lot of ways, this is a spectrum of how easy prototypes are to build over time. A verbal prototype is literally just a description, and I can change my description on the fly. A paper prototype is drawn on paper, and similarly, I could ball up the paper, throw it away, and draw a new version pretty quickly. But things like actual functional prototypes that really work, those take a good bit of time. And so we only want to do those once we’ve already vetted that the ideas that we’re going to build actually have some value. You don’t want to sink lots of months and lots of engineering resources into building something that actually works. Only to find out there’s some feedback you could have gotten just based on a drawing on a piece of paper that would have told you that your idea wasn’t a very good one. This brings us to our second concept, which is fidelity. Fidelity refers to the completeness or the maturity of the prototype. A low-fidelity prototype would be something like paper or simple drawings, very easy to change. A high-fidelity prototype would be something like a wireframe or an actual functional working interface, something that was harder to put together. We want to move from easily changeable low-fidelity prototypes to explore our ideas, to higher-fidelity prototypes to really test them out. Note that fidelity and representation are pretty closely related, low-fidelity is really about a prototype that’s pretty far from being complete. And the same thing is true for some of our early methods of prototyping. They describe different ideas, but they very heavily correlate what kinds of representations you’re going to use for different levels of fidelity. Now these different kinds of prototypes also lend themselves to different kinds of evaluation structures. Low fidelity prototypes can fine for evaluating the relative function of an interface, whether or not it can do what’s it’s designed to do. If a user looks at the interface can they figure out what they’re supposed to press? You can prototype that was just a drawing on a piece of paper, as opposed to a real functional prototype. Things like wireframes can be useful in evaluating the relative readability of the interface as well. However, to evaluate actual performance, like how long certain tasks take, or what design leads to more purchases. We generally need a higher fidelity prototype, through more iterations of the design lifecycle. So early on, we’re really just evaluating whether or not our prototype even has the potential to do what we want it to do. Can a user physically use it? Can they identify what button to press and when? For that we need additional detail like font size and real screen layout. We need a real prototype that looks the way the final interface will look, even if it doesn’t work quite yet. And then, to evaluate performance we really need a prototype that’s working, or close to working, to evaluate certain tasks. And then the final concept we need to understand, is the scope of the interface. Is it a horizontal prototype or a vertical prototype? Horizontal prototypes cover the design as a whole, but in a more shallow way. Vertical prototypes take a small portion of the interaction and prototype it in great detail. So for example, if we were designing Facebook, we might have a vertical prototype specifically for the status-posting screen and a horizontal prototype for the site in general. Now, in my experience, we usually start with horizontal prototypes earlier on, and move toward the deeper vertical prototype later. But in reality, you’ll likely move back and forth among these more frequently throughout you iterations through the design lifecycle. So, these are four the main concepts behind prototyping. There are other questions we might ask ourselves as well. Like whether we’re prototyping iterative or revolutionary changes, and the extent to which the prototype needs to be executable. But in many ways, those fall under these previous concepts.
Prototyping is largely about the trade-offs we have to make. Low-fidelity prototypes like drawings are easy to create and modify, but they aren’t as effective for detailed comprehensive evaluations. High-fidelity prototypes like actual working interfaces can be used for detailed feedback and evaluation, but they’re difficult to actually put together. So, our goal is to maximize these trade-offs. We want to start with the easier low-fidelity prototypes to get initial ideas to evaluate big designs and big plans and to make sure we’re on the right track. Then as we go along, we can move towards the higher-fidelity prototypes to take more time to assemble because we have initial evidence that our designs are actually sound. It’s really important here also to note that our prototypes are prototypes. They aren’t complete interfaces. We’ve discussed in the past that designers often have a tendency to jump straight to designing rather than getting to know their users. That’s a big risk here as well because we’re designing, but we’re designing specifically to get more feedback. So, don’t become a runaway train of designing. Designed deliberately and get feedback often.
Here are five quick tips for effective prototyping. Number 1, keep prototypes easy to change. Your goal here is to enable rapid revision and improvement. It’s easy to make quick changes to something on paper, but it’s harder to make it a code or physical prototypes. Number 2, make it clear that it’s a prototype. If you make a prototype look too good, users may focus on superficial elements like colors or font. By letting your prototype look like a prototype, you can help them focus on what you’re ready to test. Number 3, be creative. Your goal is to get feedback. Do whatever it takes to get feedback. Don’t let the type of prototype you’re designing constrain the kind of feedback you can get. If you find your current prototypes don’t give you the right kind of feedback, find ones that do. Number 4, evaluate risks. One of the biggest goals of prototyping is to minimize the time spent pursuing bad designs by getting feedback on them early. How much would you lose if you found that users hate the parts of your design that they haven’t seen yet? Whenever that answer gets to be longer than a couple hours, try to get feedback to make sure you’re not wasting time. Number 5, prototype for feedback. The goal of a prototype is to get feedback. You could spend a lot of time focusing on details like font selection and color choice, I know I do, but that’s probably not the feedback you need when you’re exploring your big alternatives. Prototype for the kind of feedback you want to get.
At the very simplest, we have verbal prototypes. That means we’re literally just describing the design we have in mind to our user. That’s probably the lowest fidelity prototype possible. It’s literally just telling the user the same thing we tell our co-designers. So, it’s extremely easy to do, although it can be hard to do effectively. Social desirability bias is big here because it’s difficult to describe our idea in a way that allows the participant to feel comfortable disagreeing with us. So, we need to make sure to ask for specific and critical feedback. At the same time though, how do we really know that the user understands the design we’re describing? We’re working toward becoming experts in the areas which we’re designing, and we don’t want to fall victim to extra blind spot by assuming our explanation makes sense to a novice. For that reason, analogies can be powerful tools for explaining prototypes. Describe your interface in terms of other tools a user might already know about. So, for example, imagine I was pitching the idea of Instacart, a grocery delivery company. I might have described it like it’s like Uber for groceries. Uber is a service like taxis, and Instacart is like a taxi for groceries. That way of describing it in terms of an analogy to another interface can be a powerful way of helping your participant understand your idea more quickly.
One step above just describing our ideas to our user in a verbal prototype would be actually drawing them out. This is what we call a “Paper prototype.” We could do this for anything, from designing an on-screen interface to designing the placement of controls in a vehicle. Let’s go back to our example of designing away for exercisers to consume and take notes on audio books. Let’s imagine one of my design alternatives was for a very easy to use app, so that the hassle of pulling out your phone and pressing buttons isn’t actually too distracting. I might start this process simply by drawing a prototype of my interface on paper. Now I have a paper prototype. Now, I can talk to my user, and I could ask her thoughts about it. We’ll talk about the kinds of thoughts I’ll ask about when we talk about evaluation, but generally, I can say, “Hey, Morgan, what do you think about this design?” Looks pretty good. Straight forward. There’s play. There’s fast-forward. I would like a way to see where I am in the book, though. That makes sense. Notice that she didn’t comment on color, or font or anything like that, not because I said, “Hey, I’m ignoring font right now,” but because it’s pretty obvious I’m not really caring about font right now. The nature of the paper prototype is it tells the user what kind of feedback we’re looking for. We’re looking for pretty basic layout information. Now, because this is on paper, I can actually immediately revise my prototype and incorporate the things that she suggested. Now, I have a revision based on her feedback, and I can ask, “Hey, how’s that?” Looks great, David. Thank you. Now, paper prototyping isn’t only useful for testing out just single interface designs. You can also do some interaction with it. Watch. So, here I have four screens prototyped. So, when I give this to Morgan, and I can say things like, imagine you’re viewing the screen and you want to take a note to attach to your current place in the book, what would you probably do? Probably press the view their notes button. That makes sense. After you press that, what you’re going to see is this screen for note-taking. So, what would you do then? Well, if I want to take a note, then I would press the record button there. Yes. Exactly. Then it would start recording you. It would transcribe what you say, and then at the end you’d press stop to continue. So interestingly, just doing this, I’ve already noticed that there’s really no reason to make her jump to a separate note-taking screen, which you want to take a note while listening. It should actually just be a note-taking button here on the main screen. So, just like this, I could walk through some interaction with the app on paper and and get some ideas like that about how I can improve it. This is also called card-based prototyping. The idea is each screen is on a different card, and I can quickly set those cards and announce so we can simulate what it would be like to use the real application. So, that way I can prototype a decent amount of the interfaces interaction with pretty low prototyping effort meat.
Paper prototyping is great when we’re designing flat interfaces for screens, but what if you’re designing a voice interface or a gesture interface? How do you prototype that? One way is called the Wizard of Oz prototyping. Wizard of Oz prototyping is named as a reference to the famous scene from the classic movie by the same name. Pay no attention to that man behind the curtain. The idea here is that we, behind the curtain, do the things that the interface would do once it’s actually implemented. That way, we can test out the interactions that we plan to design and see how well they’ll work. So, I have Morgan and we’re going to do a Wizard of Oz prototype for an interface to allow exercisers to consume and take notes on audio books. So, I’ll start by briefly telling her how the interface would work. So, this interface is run by voice commands, I’ll simulate it on my phone, and you’ll say play to play the book, pause to pause the book. You’ll say note to enter a note-taking viewer, it will transcribe what you say, and bookmark to just drop a bookmark wherever you are but without pausing. So, whenever you’re ready. Let’s do it. All wealth consists of desirable things, Pause. That is things with, Play, Which satisfy human wants directly or indirectly, Bookmark, But not all desirable things are recognized as wealth. The affection, Note. Of friends for in, This book is so good. Oh, I just realized actually, I guess I need a way for you to stop the note when you’re done. So, say close note when you’re done taking note. Close note. All right, you can stop. Now, based on this prototype, I can also ask for some feedback. So, Morgan, do you think it should automatically start playing again when you stop that bookmark or should it, what should it do? Well, okay. So, I think I’d actually like to start playing from five seconds back because I imagine saying note is going to step over some of the content. Yeah, that makes sense and we can go through this and quickly test out different ideas for how the interaction might work. In practice, Wizard of Oz Prototypes can actually get very complex. You can have entire programs that work by having a person supply the requested input at the right time. But as a concept, a Wizard of Oz Prototype is a prototype where the user can now interact authentically with the system while a human supplies functionality that hasn’t yet been implemented.
Paper prototyping involve drawing things simply on paper, and it can be really good for experimenting with overall layouts, especially because it tends to be a lot easier to revise and tweak those pretty naturally. After some feedback though, you’ll likely want to start formalizing your designs a little bit more. One way of doing this would be called wireframing. In wireframing, we use some more detailed tools to mark up what an interface might look like. For example, my paper prototype from earlier might become a wireframe like this. This lets us experiment with some additional details like font size, colors, and the challenges of screen real estate. Now, there are lots of tools out there for wireframing that come equipped with built-in common widgets and layouts. But you can also do some rudimentary wireframing in something as simple as PowerPoint. Google Drawings can be used the same way as well. So, you don’t need to get super fancy. Although if you do a lot of wireframing, you’re probably want to find a more streamlined tool. Some of the more popular paid products include Balsamiq and Axure. These are both targeted more at professionals working in user interface or user experience design, especially on teams and collaborating with a lot of people. If you’re familiar with the suites of tools from either Microsoft or Adobe, then Visio or InDesign might also be great options for you to use, because you’re already somewhat familiar with those interfaces. But you don’t actually have to buy a tool to do good wireframing. There exists some free to use tools out there as well, like the pencil project and frame box. Those are great to use, especially if you’re just getting started. Of course., these are just the popular ones that I know about right now. There are almost certainly more out there that I’m not familiar with, and more will surely come available. So, check with your classmates or colleagues to see what they would recommend. I’m personally really excited to see what kind of prototyping options emerge for areas like virtual reality and augmented reality, where you can’t really prototype on a 2D canvas like this.
Wire-framing is great for prototyping onscreen interfaces but, again, what if you’re working on something more physical or 3-dimensional? In that case, you might want to construct a physical prototype, but let’s be clear, it doesn’t have to actually work. That’s where a lot of designers get tripped up. They think to get good feedback on a design they have to have a working version, but you don’t. There are lots of elements you can test without actually implementing anything. So, let’s go back to our example of designing a way for exercisers to take notes on audio books. One of my alternatives might be a Bluetooth device that synchronizes with the phone, with buttons for different interactions. The individual will hold this while exercising and interact by pressing different buttons for play or pause or take a note. I’ve prototyped this just by taking my car’s key fob. We could just say that we pretend this button does this and that button does this. It’s probably not the exact shape I want but it’s pretty close. It’s probably about the same size and I can test the general idea of pressing buttons while exercising with this. I can actually do a lot of testing with this I can tell Morgan how it works, and watch carefully to see if the button she presses are the right ones to evaluate the intuitiveness of the interface. Or I could just ask her to go running while holding it and give me feedback on whether or not holding something physical like this in her hand throws off her routine at all. I can do a lot of prototyping without a working version
In this lesson we’ve covered various different methods for prototyping. Each method has its advantages and disadvantages. So let’s start to wrap up the lesson by exploring this with another exercise. Here are the methods that we’ve covered and here are some of the potential advantages. For each row, mark the column to which that advantage applies. Note that as always, these are somewhat relative, so your answer might differ from ours.
So here are my answers. First when we’re talking about things that are revisable during interaction, we’re talking about things that I as the experimenter can get feedback from my user and immediately change my prototype. So if they say that that button label doesn’t really makes sense, I can cross out that button label and immediately change it. That makes sense for prototypes that are very low fidelity. Verbal prototypes, I can immediately say okay, then let’s make it the way you just described. Paper prototypes or card prototypes, I could quickly erase or cross out something on my prototype and change it. Wizard of Oz is similar. Since I’m running what’s going on behind the scenes, I can just change the way I’m running it. Those four prototypes, because they’re more low fidelity, also disguises superficial details. No one is going to look at a prototype that I drew by hand and say they don’t like the font. No one is going to to listen to me run a Wizard of Oz prototype for a voice interface and say, I don’t like the voice that you’re using. These help us focus on the overall patterns of interaction and disguise some of the superficial elements that users would often have a tendency to get distracted by. However, as we prototype, we need to move from designing interfaces to designing interactions. Verbal prototypes and paper prototypes don’t really cover interactions, they cover showing something and asking the user what they think, but they don’t really go further than that. Card prototypes, Wizard of Oz prototypes, to a certain extent wireframing and to a certain extent physical prototypes all let us actually simulate the user interaction. With a card prototype, we’re actually saying if you did that, then you would see this, so they can walk through the pattern of interaction. Wizard of Oz, we can simply call out or describe or simulate, this is what would happen if you do what you just described. Now, wifeframing you could do more like a paper prototype, where it’s just a simple wire frame, but more generally, we use wire frames when we’re ready to actually show different interfaces and the movement between them. Similarly with physical prototypes, the main reason why we would do a physical prototype is to hand a user, and say, pretend you’re jogging, or pretend you’re working in your office. How would this interact with what you’re actually doing? We’re simulating the way they would physically use it. Now among all of these, the wires frames are really the ones that are most easily distributable to remote users. You can make an argument that we can send scans for a paper prototypes but generally, a paper prototype isn’t just about what’s on paper. It’s also about the conversations and descriptions that we’re having around it and asking users what they think about certain elements. Whereas a wire frame is more about a general impression that users get. You can make the argument that paper prototypes can be sent easily, as well. But for me, I would only share wire frames with remote users. Now prototyping look and feel is really just the inverse of disguise and superficial details. Look and feel is really about those superficial elements that have a significant user impact, but are more easily modifiable within an overall pattern of functional interaction. So just as the earlier low fidelity prototypes support disguising details, the later ones support prototyping look and feel. As computers become more ubiquitous, and users are moving around while interacting with interfaces more and more, allowing mobility is really valuable. Wizard of Oz, since we’re just calling things out to the user, let them move around, and same with physical prototypes. We can actually hand them to a user and have them physically interact, the way they would with the actual interface
At this point, there’s a risk of a major misconception that we should cut off right now. We started with need finding, then develop some design alternatives, and now we’re prototyping. We’ve talked about how prototyping follows a timeline to low fidelity to high fidelity prototypes, from early to late prototyping. We might think that we move on to evaluation when we’re done prototyping. That’s not the way the design life cycle works though. We go through this cycle several times for a single design and a single prototype corresponds to a single iteration through the cycle. So we did some initial needfinding, we brainstormed some alternatives, and we prototyped those alternatives on paper. We don’t jump straight from doing them on paper to doing them via wire framing or doing a functional prototype. We take those prototypes and we use them for evaluation. We evaluate those paper prototypes with real people. The results of that evaluation tell us if we need to go back and understand the task even better. Those results help us reflect on our alternatives as a whole, maybe come up with some new ones. Then, equipped with the results of that evaluation, that additional needfinding, and that additional brainstorming, we return to the prototyping phase. If our prototype seemed to be pretty successful and pretty sound, then maybe it’s time to raise the fidelity of it. Maybe we take it from a paper prototype and actually do some wire frames, or do a car prototype around the actual interaction. If it wasn’t very successful though, when we reach back here, we’re going to do a different paper prototype, or a different low fidelity prototype, and then go to evaluation again. Each time we develop a new prototype we go through the same cycle again. Now that might sound extremely slow and deliberate but we also go through this on a very different time scales too. So for example, after we’ve gone through needfinding and designing alternative brainstorming, we can develop a paper prototype. We give it to a user and get their evaluation. They say that they don’t like it. We ask them why, we ask them to describe what about their task isn’t supported by that interface. That’s in some ways another needfinding stage. Then we brainstorm real quick how we could resolve that. Maybe we just do that while we’re sitting with that user and think it didn’t support this element of what they described, but I could add that pretty quickly just by making this button or this function more visible. Now we very quickly have a new prototyping just by sketching out that addition to that paper prototype and now we can do it again. This cycle could take one minute. We could take one prototype, put it in front of a user, get their evaluation, figure out what they liked and didn’t like, brainstorm a way to fix that, and then immediately revise it and try it again. We can go through this very, very quickly. We could also go through this very slowly, we could have prototypes that take months to develop. And generally that’s why we only want to do that after we’ve gone through the cycle a few times. Because if we’re going to take months to develop a prototype, we want to make sure we’re probably going to get some pretty good evaluations on it. And we can make sure of that by prototyping the elements in lower fidelity first.
There’s one other misconception that I’ve seen in some designers I’ve worked with that I feel is also worth explicitly acknowledging. All your prototypes don’t have to be at the same level, at the same time. Take Facebook for example. Facebook is a complete app already implemented. Imagine that Facebook wanted to redesign their status update box, which they’ve done pretty recently and have probably done since I recorded this. Just because the interface is complete in other ways doesn’t mean that all future prototyping efforts need to be similarly high fidelity. They don’t need to implement an entire new status composition screen just to prototype it. They can prototype it in lower fidelity with sketches, or wire frames, put that in front of users, get their feedback, before ever actually implementing it into a functional prototype or a working part of the website. This applies particularly strongly to the design of apps or programs with multiple functions. So take something like the LinkedIn app. It has a number of different functions like editing your own profile, or connecting with others, or browsing your news feed. Each of these individual screens has its own tasks and interactions. And moving amongst them, is itself a task or a type of interaction. Trying to design all the screens and the transitions among them all at the same time is likely far too much. So we could take the bottom-up approach, where we would design the individual screens first, and then design the app experience as a whole. Or we might take the top-down approach and design the overall experience of moving between these different screens, and then design the contents of the individual screens. The point of this is that at any time, protoyping can and should exist at multiple levels of fidelity.
If you’re working in an application area that relies on traditional screens and input methods, your prototyping process might be pretty straightforward. It’ll go from paper prototypes to wireframes exploring iteratively more complete versions of the final interface. For a lot of emerging domains though, you’ll have to get somewhat creative with your prototyping. For things like gestural or voice interaction, you can likely use Wizard of Oz prototyping by having a human interpret the actions or commands that will ultimately be interpreted by the computer. For augmented reality or wearable devices though, you might have to get even more creative. So take a second and brainstorm how you might go about prototyping in your chosen field. Remember, your goal is to get feedback on your ideas from the user early. What can you create that will get you that feedback?
In this lesson, we’ve talked about several methods for prototyping. Our goal is to employ a lot of methods to get feedback rapidly,and iterate quickly on our designs. Through that process, we can work our way toward creating our ultimate interface. The main goal of this prototyping process has been to create designs we can evaluate with real users. We’re obviously not going to deploy a hand-drawn interface to real customers. Its value is in its ability to get us feedback. That’s what the entire design life cycle has been leading towards: evaluation, evaluating our ideas, evaluating our prototypes, evaluating our designs. That user evaluation is the key to user-centered design. Focusing on user evaluation ensures that our focus is always on the user’s needs and experiences. So, now that we’ve researched users needs, brainstorm some design alternatives, and create some shareable prototypes, let’s move on to actual evaluation.
[MUSIC] The heart of user-centered design is getting frequent feedback from the users. That’s where evaluation comes into play. Evaluation is where we take what we’ve designed and put it in front of users to get their feedback. But just as different prototypes serve different functions at different stages of the design process, so also our methods for evaluation need to match as well. Early on, we want more qualitative feedback. We want to know what they like, what they don’t like, whether it’s readable, whether it’s understandable. Later on, we want to know if it’s usable. Does it actually minimize your workload? Is it intuitive? Is it easy to learn? Then at the end, we might want to know something more quantitative. We might want to actually measure, for example, whether the time to complete a task has changed, or whether the number of sales has increased. Along the way, we might also want to iterate even more quickly by predicting what the results of user evaluation will be. The type of evaluation we employ is tightly related to where we are in our design process. So in this lesson, we’ll discuss the different methods for performing evaluation to get the feedback we need when we need it.
There are a lot of ways to evaluate interfaces. So to organize our discussion of evaluation, I’ve broken these into three categories. The first is qualitative evaluation. This is where we want to get qualitative feedback from users. What do they like, what do they dislike, what’s easy, what’s hard. We’ll get that information through some methods very similar, in fact identical, to our methods for need finding. The second is empirical evaluation. This is where we actually want to do some controlled experiments and evaluate the results quantitatively. For that, we need many more participants, and we also want to make sure we addressed the big qualitative feedback first. The third is predictive evaluation. Predictive evaluation is specifically evaluation without users. In user centered design, this is obviously not our favorite kind of evaluation. Evaluation with real users though is oftentimes slow and its really expensive. So it’s useful for us to have ways we can do some simple evaluation on a day to day basis. So we’ll structure our discussion of evaluation around these three general categories.
Before we begin, there’s some vocabulary we need to cover to understand evaluation. These things especially applied to the data that we gathered during evaluation. While there are particularly relevant for gathering quantitative data, they’re useful in discussing or other kinds of data as well. The first term is reliability. Reliability refers to whether or not some assessment of some phenomenon is consistent over time. So for example, Amanda what time is it? It’s about 2:30. Amanda what time is it? It’s about 2:30. Amanda, what time is it? It’s 2:30. Amanda is a very reliable assessment of the time. Every time I asked, she gives me the same time. We want that in an assessment measure. We want it to be reliable across multiple trials. Otherwise, its conclusions are random and just not very useful. A second principle is validity. Validity refers to how accurately and assessment measures reality. An assessment could be completely reliable but completely inaccurate. So for example, Amanda, what time is it? Oh my goodness, it’s 2:30! Actually it’s 1:30. Oh, shoot! So while Amanda was a reliable timekeeper, she wasn’t a very valid one. Her time wasn’t correct even though it was consistent. Validity is closely connected to a principle called generalizability. Generalizability is the extent to which we can apply lessons we learned in our evaluation to broader audiences of people. So for example, we might find that the kinds of people that volunteer for usability studies have different preferences than the regular user. So the conclusions we find that those volunteers might not be generalizable in measuring what we want to measure.Finally one last term we want to know is to understand its precision. Precision is a measurement of how specific some assessment is. So for example, Amanda, what time is it? Well apparently, it’s 1:30. Actually, it’s 1:31. Come on! But in this case, no one’s really going to say that Amanda was wrong in saying that it was 1:30. She just wasn’t as precise. I could just as accurately say it’s 1:31:27, but that’s probably more precision than we need. As we describe the different kinds of data we can gather during evaluation, keep these things in mind. If we were to conduct the same procedure again, how likely is it that we’d get the same results? That’s reliability. How accurately does our data actually capture the real world phenomenon that we care about? That’s validity. To what extent can we apply these conclusions to people that weren’t in the evaluation? That’s generalizability. How specific are our conclusions and observations? That’s precision.
In designing evaluations, it’s critical that we define what we’re evaluating. Without that, we generally tend to bottom out in vague assessments about whether or not users like our interface. So, here are five quick tips on what you might choose to evaluate. Number one, efficiency. How long does it take users to accomplish certain tasks? That’s one of the classic metrics for evaluating interfaces. Can one interface accomplish a task in pure actions or in less time than another? You might test this with predictive models or you might actually time users in completing these tasks. Still though, this paints a pretty narrow picture of usability. Number two, accuracy. How many errors do users commit while accomplishing a task? That’s typically a pretty empirical question although we can address it qualitatively as well. Ideally, we want an interface that reduces the number of errors a user commits while performing a task. Both efficiency and accuracy, however, examined the narrow setting of an expert user using an interface. So, that brings us to our next metric. Number three, learnability. Sit under user down in front of the interface. Define some standard for expertise. How long does it take the user to hit that level of expertise? Expertise here might range from performing a particular action to something like creating an entire document. Number four, memorability. Similar to learnability, memorability refers to the user’s ability to remember how to use an interface over time. Imagine you have a user alone in an interface, then leave and come back a week later. How much do they remember? Ideally, you want interfaces that need only be learned once, which means high memorability. Number five, satisfaction. When we forget to look at our other metrics, we bought them out in a general notion of satisfaction. But that doesn’t mean it’s unimportant. We do need to operationalize it though. Experience is things like uses enjoyment of the system or the cognitive load they experience while using the system. To avoid social desirability bias, we might want to evaluate this in creative ways like finding out, how many participants actually download an app they tested after the session is over? Regardless of what you choose to evaluate, it’s important that you very clearly articulate at the beginning, what you’re evaluating, what data you’re gathering, and what analysis you will use. These three things should match up to address your research questions.
When we discussed prototyping, we talked about how over time the nature of our prototypes get higher and higher fidelity. Something similar happens with evaluation. Over time, the evaluation method we’ll use will change. Throughout most of our design process our evaluations are formative. Meaning their primary purpose is to help us redesign and improve our interface. At the end, though, we might want to do something more summative to conclude the design process, especially if we want to demonstrate that the new interface is better. Formative evaluation is evaluation with the intention of improving the interface going forward. Summative is with the intention of conclusively saying at the end what the difference was. In reality, hopefully we never do summative evaluation. Hopefully our evaluations are always with the purpose of revising our interface and making it better over time. But in practice, there might come times when you need to demonstrate a very clear quantitative difference. And because of this difference, our early evaluations tend to be more qualitative. Qualitative evaluations tend to be more interpretative and informal. Their goal is to help us improve or understand the task. Our later evaluations are likely more empirical, controlled, and formal. Their goal is to demonstrate or assess change. So while formative evaluation and summative evaluation were the purposes of our evaluations, qualitative evaluations and empirical evaluations are ways to actually fulfill those purposes. Predictive evaluation is a little outside the spectrum, so we’ll talk about that as well. As far as this is concerned, predictive evaluations tend to be very similar to qualitative evaluations. They inform how we revise and improve our interfaces over time. These three categories actually form the bulk of what we’ll talk about in this lesson. Recall also that earlier we talked about the difference between qualitative and quantitative data. As you’ve probably realized, if qualitative evaluation occurs early, an empirical evaluation occurs late. And chances are, we’re using qualitative data more early, and quantitative data more late. In reality, qualitative data is really always useful to improve our interfaces, whereas quantitative data, while always useful, really can only arise when we have pretty rigorous evaluations. And then one last area we can look at is where the evaluation takes place. In a controlled lab setting or actually out in the field. Generally when we’re testing our early low fidelity interfaces, we probably want to do it in a lab setting as opposed to out in the wild. We want to bring participants into our lab and actually describe what we’re going for, the rationale behind certain decisions, and get their feedback. Later on we might want to do real field testing where we give users a somewhat working prototype, or something resembling a working prototype. And they can actually reflect on it as they go about their regular lives, participating in whatever task that interface is supposed to help with. This helps us focus exclusively on the interface early on, and in transition to focusing on the interface in context later. But of course we want to also think about the context early on. We could for example, develop a very navigation app that works great when we test in our lab, because it demands a very high cognitive load. But doesn’t work at all out in the field because when participants are actually driving, they can’t spare that cognitive load to focus on our app. Now of course none of these are hard and fast rules. We’ll very likely often do qualitative evaluation late or maybe do some field testing early. But as general principles, this is probably the order in which we want to think about our different evaluation styles.
Regardless of the type of evaluation you’re planning to perform, there’s a series of steps to perform to ensure that the evaluation is actually useful. First, we want to clearly define the task that we’re examining. Depending on your place in the design process this can be very large or very small. If we were designing Facebook, it can be as simple as posting a status update, or as complicated as navigating amongst and using several different pages. It could involve context and constraints like taking notes while running, or looking up a restaurant address without touching the screen. Whatever it is, we want to start by clearly identifying what task we’re going to investigate. Second, we want to define our performance measures. How are we going to evaluate the user’s performance? Qualitatively, it could be based on their spoken or written feedback about the experience. Quantitatively, we can measure efficiency in certain activities or count the number of mistakes. Defining performance measures helps us avoid confirmation bias. It makes sure we don’t just pick out whatever observations or data confirm our hypotheses, or say that we have a good interface. It forces us to look at it objectively. Third, we develop the experiment. How will we find user’s performance on the performance measures? If we’re looking qualitatively will we have them think out loud while they’re using the tool? Or will we have them do a survey after they’re done? If we’re looking quantitatively what will we measure, what will we control, and what will we vary? This is also where we ask questions about whether our assessment measures are reliable and valid. And whether the users we’re testing are generalizable. Fourth, we recruit the participants. As part of the ethics process, we make sure we’re recruiting participants who are aware of their rights and contributing willingly. Then fifth, we do the experiment. We have them walk-through what we outline when we develop the experiment. Sixth, we analyze the data. We focus on what the data tells us about our performance measures. It’s important that we stay close to what we outlined initially. It can be tempting to just look for whatever supports are design but we want to be impartial. If we find some evidence that suggests our interface is good in ways we didn’t anticipate, we can always do a follow up experiment to test if we’re right. Seventh, we summarize the data in a way that informs our on going design process. What did our data say was working? What could be improved? How can we take the results of this experiment and use it to then revise our interface? The results of this experiment then become part of our design life cycle. We investigated user needs, develop alternatives, made a prototype and put the prototype in front of users. To put the prototype in front of users, we walked through this experimental method. We defined the task, defined the performance measures, developed the experiment, recruited them, did the experiment, analyzed our data and summarized our data. Based on the experience, we now have the data necessary to develop a better understanding of the user’s needs, to revisit our earlier design alternatives and to either improve our prototypes by increasing their fidelity or by revising them based on what we just learned. Regardless of whether we’re doing qualitative, empirical, or predictive evaluation, these steps remain largely the same. Those different types of evaluation just fill in the experiment that we develop, and they inform our performance measure, data analysis, and summaries.
Qualitative evaluation involves getting qualitative feedback from the user. There are a lot of qualitative questions we want to ask throughout the design process. What did you like? What did you dislike? What were you thinking while using this interface? What was your goal when you took that particular action? Now, if this sounds familiar, it’s because it should be. The methods we use for qualitative evaluation are very similar to the methods we used for need-finding: interviews, think-aloud protocols, focus groups, surveys, post-event protocols. We use those methods to get information about the task in the first place, and now, we can use these techniques to get feedback on how our prototype changes the task.
Let’s run through some of the questions you’ll have to answer in designing a qualitative evaluation. First, is this based on prior experience, or is it a live demonstration? If you’re bringing in users to answer questions about some interface that they’re already using regularly, I’d probably argue you’re actually doing need-finding. Now, the distinction can be subtle because evaluation does lead to additional need-finding, but most of these questions are going to apply more to a live demonstration. This is where you’re bringing users in to test out some new interface for new prototype. Second, is the session going to be synchronous or asynchronous? In a synchronous session, you’re sitting and watching the user live. You’re actually watching them use this interface you use in this prototype. If they’re going to complete it on their own and just send you the results, and then it’s an asynchronous session. Synchronous is usually beneficial because we see a much greater amount of the interactions taking place. We might also be able to interrupt the user and get their thoughts live. Asynchronous, though is often much easier to carry out, especially with larger populations. I generally recommend synchronous whenever possible, but asynchronous is certainly better than nothing. Third, how many prototypes or how many interfaces will they be evaluating? You might have users come in to evaluate only one interface, or you might have them look at multiple prototypes to compare between them. If you’re having them look at more than one prototype, you want to make sure to vary the order in which you present them, otherwise, you might get consistently different feedback just because the user is already familiar with the problem domain when they get to the second interface. That can be particularly significant if you’re trying out some new interface compared to an interface that you’ve used in the past. If you always present the old first, then you’ll probably get a lot of users saying the new one is much better, when in reality, they’re just more familiar with the problem now. Fourth, when do you want to get feedback from the user? There are two main protocols for doing this. There’s a Think Aloud Protocol and a Post-Event Protocol. In a Think Aloud Protocol, you ask the user to actually think out loud while they’re using your interface or prototype. You ask him to explain what they’re seeing and explain how they interpreted, and explain what they think the outcome of their actions will be. In a Post-Event Protocol, you have the user go through some session using the interface or testing out the prototype, and then only giving you thoughts at the end. A Post-Event Protocol has the drawback, that you’re only getting the user’s feedback at the end. So, if they experience some difficulty early on, they may have forgotten it by the time you actually get feedback from them, but a Think Aloud Protocol has a problem, and then it might introduce some new biases. Research shows that when users are asked to think out loud while using an interface, the way they use the interface actually changes. They’re more deliberative. They’re more thoughtful. They’re less intuitive about their interactions. In general, what that means, is that when we ask users to think out loud about an interface, oftentimes, they’ll figure out how the interface works, but then, if users use the same interface without having to think out loud, they find it much more confusing. Talking through their thought process helps them understand, but our real end-users aren’t going to talk through their thought process, so it’s often good to use a mix of these two. In fact, I’d usually suggest doing a Think Aloud Protocol earlier, and using a Post-Event Protocol more as a summative evaluation once you’re already pretty confident in how good your interfaces, but others advice may differ. In either case, it’s worth noting that users are not often very good at explaining why they like something or why they did something, so we should always take the feedback that we get with a grain of salt. Finally, do you want to get feedback from individuals or from groups? Focus groups are used when multiple users talk together about their experiences. This can actually lead to better explanations, because users build on each other and expand each other’s ideas, but it can also strongly biased the group towards the opinions of the most powerful personalities. Individual interviews and surveys force the user to be the only source of knowledge, which can be bad, but it also means the user isn’t biased by other outside views. As you’ll notice and has probably become a trend, whenever we talk about multiple different options for doing evaluation or need-finding or whatever, different approaches have different strengths, and they address the weaknesses of other approaches. So, generally, if you’re going through multiple iterations of the design life cycle, you’ll probably use all of these different ideas at different times.
With qualitative research, we want to capture as much of the session as possible. Because things could come up that we don’t anticipate. And we’d like to look at them again later. So how do we do that? One way is to actually record the session. The pros of recording a session are that it’s automated, it’s comprehensive, and it’s passive. Automated means that it runs automatically in the background. Comprehensive means that it captures everything that happens during the session. And passive means that it lets us focus on administering the session instead of capturing it. The cons though, are that it’s intrusive, it’s difficult to analyze, and it’s screenless. Intrusive means that many participants are uncomfortable being videotaped. It creates oppression knowing that every question or every mistake is going to captured and analyzed by researchers later. Video is also very difficult to analyze. It requires a person to come later and watch every minute of video, usually several times, in order to code and pull out what was actually relevant in that session. And video recording often has difficulty capturing interactions on-screen. We can film what a person is doing on a keyboard or with a mouse, but it is difficult to then see how that translates to on-screen actions. Now some of these issues can be resolved, of course. We can do video capture on the screen synchronize it with a video recording. But if we’re dealing with children, or at risk populations, or with some delicate subject matter, the intrusiveness can be overwhelming. And if we want to do a lot of complex sessions, the difficulty in analyzing that data can also be overwhelming. For my dissertation work I captured about 200 hours of video, and that’s probably why it took me an extra year to graduate. It takes a lot of time to go through all that video. So instead we can also focus on note-taking. The benefits of note-taking are that it’s very cheap, it’s not intrusive, and it is analyzable. It’s cheap because we don’t have to buy expensive cameras or equipment, we just have our pens and papers or our laptops, or anything like that. And can just do it using equipment we already have available to us. It’s not intrusive, in that it only captures what we decide to capture. If a participant is uncomfortable asking questions or makes a silly mistake with the interface, we don’t necessarily have to capture that, and that can make the participant feel a little bit more comfortable being themselves. And it’s a lot easier to analyze notes. You can scroll through and read the notes on a one hour session in only a few minutes. But analyzing that same session in video is certainly going to take at least an hour, if not more, to watch it more than once. But but of course there are draw backs too. Taking those can be a very slow process, meaning that we can’t keep up with the dynamic interactions that we’re evaluating. It’s also manual which means that we actually have to focus on actively taking notes, which gets in the way of administering the session. If you’re going to use note taking, you probably want to actually have two people involved. One person running the session, and one person taking notes. And finally, it’s limited in what it captures. It might not capture some of the movements or the motions that a person does when interacting with an interface. It doesn’t capture how long they hesitate before deciding what to do next. We can write all that down of course, but we’re going to run into the limitation of how fast we can take notes. It would be nearly impossible to simultaneously take notes on what questions the user is asking, how long they’re taking to do things, and what kind of mistakes they’re making. Especially if we’re also responsible for administering the session at the same. A third approach if we’re designing software, is to actually log the behavior inside the software. This is in some ways, the best of both worlds. Like video capture, it’s automatic and passive, but like note taking, it’s analyzable. Because it’s run to the system itself, it automatically captures everything that it knows how to capture, and it does so without our active invention. But it likely does so in a data or text format, that we can then either analyze manually by reading through it, or even with some more complicated data analytics methods. So in some ways, it captures the pros from both note-taking and video capture. But it also has its drawbacks as well. Especially, it’s very limited. We can only capture those things that are actually expressed inside the software. Things like the questions that a participant asks wouldn’t naturally be captured by software logging. Similarly, it only captures a narrow slice of the interaction. It only captures what the user actually does on the screen. It doesn’t capture how long they look at something. We might be able to infer that by looking at the time between interactions, but it’s difficult to know if that hesitation was because they couldn’t decide what to do, or because someone was making noise outside, or something else was going on. And finally, it’s also very tech sensitive. We really have to have a working prototype, in order to use software logging. But remember, many of our prototypes dont’ work yet. You can’t do software logging on a paper prototype, or a card prototype, or a Wizard of Oz prototype. This only really works once we’ve reached a certain level of fidelity with our interfaces. So in selecting a way to capture your qualitative evaluation, ask yourself, will the subjects find being captured on camera intrusive? Do I need to capture what happens on screen? How difficult will this data be to analyze? It’s tempting, especially for novices, to focus on just capturing as much as possible during the session. But during the session is when you can capture data in a way that’s going to make your analysis easier. So think about the analysis that you want to do, when deciding how to capture your sessions.
Here are five quick tips for conducting successful evaluations. Number one, run pilot studies. Recruiting participants is hard. You want to make sure that once you start working with real users, you’re ready to gather really useful data. So, try it your experiment with friends or family or co-workers before trying it out with real users to iron out the kinks in your design and your directions. Number two, focus on feedback. It’s tempting in qualitative evaluations to spend too much time trying to teach this one user. If the user criticizes an element of the prototype, you don’t need to explain to them the rationale. Your goal is to get feedback to design the next interface, not to just teach this one current user. Number three, use questions when users get stuck. That way, you get some information on why they’re stuck, and what they’re thinking. Those questions can also be used to guide users to how they should use it to make the session seem less instructional. Number four, tell users what to do, but not how to do it. This doesn’t always apply, but most often we want to design interfaces that users can use without any real instruction whatsoever. So, in performing qualitative evaluation, give them instruction on what to accomplish, but let them try to figure out how to do it. If they try to do it differently than what you expect, then you know how to design the next interface. Number five, capture satisfaction. Sometimes, we can get so distracted by whether or not users can use our interface that we forget to ask them whether or not they like using our interface. So, make sure to capture user satisfaction in your qualitative evaluation.
In an empirical evaluation, we’re trying to evaluate something formal, and most often that means something numeric. It could be something explicitly numeric like what layout of buttons leads to more purchases or what gestures are most efficient to use. There could also be some interpretation involved though like counting errors or summarizing survey responses. The overall goal though is to come to the something verifiable and conclusive. In industry, this is often useful in comparing designs or in demonstrating improvement. In research though, this is even more important, because this is how we build new theories of how people think when they’re using interfaces. If we wanted to prove, for example, that gestural interaction has a tougher learning curve than voice interaction, or that an audio interface is just as usable as a visual one, we would need to do empirical evaluation between the interfaces. Now, most empirical evaluations are going to be comparisons. We can do quantitative analysis without doing comparisons, but it usually isn’t necessary. The biggest benefit of quantitative analysis is the ability to help us perform objective comparisons. So, with empirical evaluation, our overall question is usually, how can we show that there is a difference between these designs?
When we do qualitative evaluations, we effectively just bring in participants one at a time or maybe in groups, go through a certain interview protocol or script, and then move them along. Empirical evaluation is different though. Here, we have multiple conditions which we call treatments. These treatments could be different interfaces, different designs, different colors, whatever we’re interested in investigating. Our goal here is to investigate the comparison between the treatments and end up with a conclusion about how they are different. However, we have to be careful to make sure that the differences that we observe really are due to differences between the treatments and not due to other factors. For example, imagine we’re testing the difference between two logos, and we want to know what works better, orange or teal. However, we also make one a circle and the other one a triangle. In the end, we wouldn’t be able to comment on orange versus teal. We can only comment on orange circle versus teal triangle. To make a judgement about the color, we need to make sure that the color is the only thing that we’re comparing. Of course, this example right here sound a little bit silly. In practice though, differences can be more subtle. If you’re testing between different layouts, you might missed that one loads a bit faster or that one uses prettier images, and that could actually account for any differences that you observe. Once we’ve designed the treatments, it’s time to design the experiment itself. Our first question is, what do the participants do? Does each participant participate in one treatment or both? If each participant only participates in one treatment, then our next step is pretty easy. We split the participants randomly into two groups, and one by one, they each go through their treatment. At the end, we have the data from participants in one group to comparative data from participants in the other group. This is a between subjects design, we’re comparing the data from one group to the other group. There’s a second option though, we can also do a within subjects experiment. With a within subjects experiment, each participant experiences both treatments. However, a major lurking variable could potentially be, which treatment each participant sees first? So, we still have to randomly assign participants to treatment groups. But instead of assigning participants to which treatment they are receiving, we’re randomly assigning them to what order they’ll receive the treatments in. That way, if the order of the participants received the treatments in matters, we’ll see it in our data. Within subject is beneficial because it allows us to gather twice as much data if our participant pool is limited. Here, each interface would be used by 16 participants instead of just eight. It also allows us to do within subjects comparison, seeing how each individual participant was affected, instead of the groups as a whole. That can help us identify some more subtle effects, like, if different people have different strengths. However, within subjects requires more of our subjects time, which can be a big problem if the treatments are themselves pretty long. Throughout this example, we’ve also glossed over important detail. Random assignment. Random assignment to treatment helps us control for other biases. Imagine if all the smartest participants or all the women were all placed in one group and all receive the same treatment, that would clearly affect our results. That’s why we randomly assign people to groups. Now, that might sound obvious, but imagine if you’re treatment involves a physical setup, it would be tempting to run the first eight participants on one setup and a second eight participants on the other, so you don’t have to switch back and forth between them. But what if that means that all the more punctual participants were in the first condition, or what if you got better at administering the experiment during the first condition? So that participants in the second condition had a generally smoother experience. All of these looking variables are controlled in part, by random assignment to groups.
Let’s pretend this is a reaction time study because that gives us some nice numeric data. We’re curious, what color should be used to alert a driver in a car that they’re starting to leave their lane? We run half of them with orange, and half of them with green in a between subjects design. As a result, they’ve generated some data. These are each participant’s reaction times. Our goal is to compare this data and decide which is better. So, how do we do that? Well, we might just average it. So, it looks like the orange is smaller than the green, right? So, the orange is better, and not so fast, no pun intended. These numbers are very close. It’s entirely possible this difference arose just by random chance. They’re not likely going to be exactly equal in any trial. The question is are they different enough to conclude that they’re really different? In fact, in this case, they definitely are if you run the numbers. The process for actually doing this rigorously is called hypothesis testing. Whenever we’re trying to prove something, we initially hypothesized that the opposite is true. So, if we’re trying to prove that one of these options is better than the other, we initially hypothesized that actually, they are equal. That’s the null hypothesis. It’s the hypothesis that we accept if we can’t find sufficient data to support the alternative hypothesis. So, we want to see if this difference is big enough to accept the alternative hypothesis instead of the null hypothesis. We generally accept the alternative hypothesis if there’s less than a five percent chance that the difference could have arisen by random chance. In that case, we say that the difference is statistically significant. So, here, there’s probably a pretty good chance that this difference could arise by random chance, not because orange is actually any better than green. But imagine if the data changed. Imagine if these were our observations from green instead of the ones before. Our average has gone up considerably from 0.3 to 0.44. Here, it’s unlikely for this difference to be based only on random chance. It’s still possible. It’s just far less likely. This is the general process of hypothesis testing. Assuming that things are the same and seeing if the data is sufficient to prove that they’re different.
Null and alternative hypotheses are common to all kinds of hypothesis tests. The specific kind of hypothesis tests you conduct however, depends on the kind of data that you have. Recall when we first started talking about quantitative data, we discussed four general kinds: Nominal, ordinal, interval and ratio. What kinds of hypothesis tests we use will depend on what kind of data we’ve collected. Note that my goal here isn’t for you to know off the top of your head how to do each of these tests. Instead, I want you to know when to use each kind of test. So, you can go and look it up when needed. I’ve used all these kind of tests before and I can’t remember off the top my head how to do any of them. But I know when to go look up which test and that’s the most important thing for now. For nominal data, we generally want to do something called a Chi-squared test. A Chi-squared test checks to see if the distribution of values to a number of buckets is the same across two conditions. For all of these, I’m going to use a series of examples inspired by comparison that I did recently between my online undergraduate course and in traditional undergraduate course. So for example, one of my questions was, does the distribution of majors differ between the online section and the traditional section? Here our independent variable would be a pair of categories. For me it’s online and traditional, and our dependent variable will be the distribution among some other set of categories, here it’s majors. We always assume that the distributions are equal. So, our alternative hypothesis would be the distributions are unequal. Now, obviously the distributions are unequal in some ways. There’s a lot more students in traditional section. That ratio isn’t perfectly mirrored across all these categories. Our question is, are these differences big enough to assume that it’s an actual difference between online and traditional? Or, are the differences just a product of random noise? If there are four times as many students in the traditional section as the online section, then we wouldn’t expect exactly four times as many computer science majors and exactly four times as many math majors. But we also wouldn’t expect the ratios to be way off either. So, a Chi-squared test lets us test to see whether distributions to this categories are comparable. For ordinal data, we actually go with a very similar process. Just like with nominal data, we have our different categories as our independent variable and our dependent variable is some distribution among a number of categories. The difference here is that these categories are ranked. What that means is we could actually use the exact same test. We could use the Chi-squared test. The weakness there is that the Chi-squared test isn’t as sensitive to systematic changes across these categories. The Chi-squared test assumes all these different categories are independent. We know though that if we see a small shift in all the categories from left to right, that might actually be a significant difference between the two sections. So, it’s generally better to use what’s called the Kolmogorov-Smirnov test. It’s very similar to the Chi-squared test but it’s sensitive to the fact that those categories are ordered. Now that said, using the Chi-squared test isn’t going to hurt you with this kinda data and sometimes it can be easier to run. There’s also an alternative that more simply just test to see whether it’s likely that the medians of the two are different. It’s also not terribly uncommon to make the assumption that these ordinal data categories are evenly spaced. Now, remember when we talked about ordinal data and that was exactly one of the weaknesses that we noted. We noted that we can’t know the difference between highly dissatisfied and dissatisfied, is the same as the difference between dissatisfied and neutral. Because we can’t know that, we’re really not supposed to just average these but in practice it’s not uncommon for people to do so anyway. In that case, we might just assign these categories numbers like one, two, three, four and five, multiply the number of observations by those numbers, average them over the number of actual respondents and see if there’s a statistically significant difference. It’s not good to do that but if you chose to do that you wouldn’t be the first person ever to do so. When we get to interval and ratio data, the kinds of tests we use actually shift a good bit. First, we really never treat interval and ratio data as different. The fact that ratio data has a zero-point and interval data doesn’t, doesn’t affect any of our statistical tests. The statistical tests that we do for this kind of data are always dependent on the average and the standard deviation. The most common test is called the Student’s t-test. I’m not going to talk about it right now but I do recommend looking up why it’s called the Student’s t-test. It’s kind of an interesting story. For these are independent variable, is the same as it’s been in the past. It’s a category we’re comparing between two different populations. But our dependent variable is actually just the average of some kind of outcome. We’re not talking about distributions into subcategories but rather just looking at a certain outcome for all participants in each category. Our null hypothesis is that the outcomes are equal and our alternative hypothesis is that the outcomes are unequal. A t-test lets us pretty straightforwardly compare between the two and see if the difference is actually statistically significant. Again, we wouldn’t expect the two numbers to have to be exactly identical to say that the populations are not any different. Here, an average of 10.2 compared to the online section did any better than the traditional section especially when the standard deviations are pretty significant fractions of those averages. Don’t worry though if you don’t fully understand the math behind this, you’re really not supposed to right now. For now all you really need to know is that the difference has to be big enough to justify that one is actually different from the other and how big, big enough actually is, is dependent on how big that standard deviation is. The bigger the standard deviation, the bigger the difference needs to be. Now it’s also worth noting that we’re only supposed to use t-tests when the data distribution is normal. That means it mirrors irregular bell-curve. If it’s not normal, we’re supposed to use things like the Mann-Whitney-Wilcoxon test or the Kruskal-Wallis Test. But those topics are kind of outside our scope. Generally speaking, a Chi-squared test and a t-test will get you most of what you need at least until we get to having three different levels of our independent variable.
Chi-squared tests and t-tests are probably the most commonly used tests in HCI. However, there are some assumptions embedded in the ones we just looked at. First, notice that we only ever had two levels to our independent variable. We were only ever comparing online and traditional students. For your work, that might mean only comparing two different interfaces. What if we wanted to test three? How can we do that? Imagine for example, I wanted to test these two classes against a third class or flipped class. Here we’d be testing the online section versus the traditional section versus the flipped section, how would we do that? You might be tempted to just test them in a pairwise fashion, test online versus traditional, traditional verse flipped and online verse flipped. You use that to try to uncover any differences between pairs. That’s called repeated testing, and the problem, is that it raises the likelihood of a type one error. A type one error is also called a false positive, and it’s where we falsely reject the null hypothesis. In other words, we falsely say that we have enough data to conclude the alternative hypothesis. Here that would mean, falsely concluding that there is a difference when there isn’t actually a difference. The reason repeated testing raises likelihood of this, is that remember we said that we reject the null hypothesis if there’s generally less than a five percent chance it could have occurred by random chance. But if you do three different tests, you raise the likelihood of one of them turning up conclusive even though it really isn’t. Think of it like playing the lottery. If I say you have a one in 20 chance of winning and you play 20 times, you’ll still win once. That’s because your overall odds of winning increased. Performing multiple tests, raises our overall likelihood of finding a false positive. So instead, what we need, is a single test that can compare against all these different treatments at once. Now fortunately, for ordinal or nominal data, it’s actually just the same test. A chi-squared test can handle more than just two levels of our independent variable. Our alternative tests change a little bit, if we’re dealing with more than two levels. The weakness here, is if we do a Chi-squared test on all three of these levels at once, all it will tell us is if there’s any difference between any of the levels. It doesn’t tell us where the difference is. So, if this chi-squared test shows that there is a difference, we don’t have any way of knowing, is it the difference between the online and traditional, online and flipped, traditional and flipped, or is it a case where the flipped is different from both the online and traditional or something like that. So, generally what we do, is we do an overall chi-squared tests on all of the levels, and then we can follow up if that first test was successful with a pairwise comparison between the conditions. In that case, for basically concluding that we know there’s a difference before we actually do the repeated testing. So, the overall odds of finding a false positive aren’t changing. For interval and ratio data though, we need to use a different test altogether. This test is called an analysis of variance or ANOVA. A one-way ANOVA test, let’s us compare between three or more groups simultaneously. Here, that means we could test between all three of our classes at the same time. For you that can mean guessing between three or four interfaces at the same time. With a two-way ANOVA, we could actually take this a step further. We could have two dimensions of independent variables, we could test online traditional and flipped against upper class mean versus lower-class mean. We could actually get it differences like, do freshmen do better at online but sophomores do better in traditional. The weakness though, is the same as the weakness with the chi-squared test, and analysis of variance will tell us if there are differences, but it won’t tell us where the differences are. Our approach to that is the same as it was with the chi-squared test as well. If the analysis of variance indicates there’s an overall difference, then we can follow up with pairwise t-tests. Notice though, there’s still one assumption that’s been embedded in every single analysis we’ve talked about. Our independent variables have always been categorical, that’s generally true for most of the tests we’re going to do. If we’re testing one interface against another, then those are our two categories. If we’re testing one body of people against another, then those are our two categories. So, this isn’t really a weakness or a challenge, but there are cases where we want our independent variable to be something non-categorical. Mostly that happens when we want our independent variable to be some interval or ratio data. So, imagine for example I wanted to see if GPA was a good predictor of course grade. GPA though is generally considered interval data, we might consider it ratio data but it’s usually discussed as interval data. We could do this by breaking GPA down into categories. Instead of this, we could average the course grades for anyone with a GPA from 3.5-4, Or we could leave the GPA is interval data, and just do a direct analysis. Generally, here we’d be doing a regression or we would see how well one variable predicts another. Most of our regressions are linear, but we could also do a logistic regression, a polynomial regression, and lots more. Again, I’m getting outside the scope of our class. Here, our null hypothesis is that the variables are unrelated, and our alternative hypothesis is that they are related. So, we need evidence that they’re related before assuming that they are. Here things get a little bit more complex as well, because we’re not quite as emphatic about how we reject our null hypothesis and accept our alternative hypothesis. Usually with regressions, we describe how well the two fit. They might fit very well, somewhat well, not well at all, and so on. Before we move on, there’s one last type of data I’d like to talk about, and that’s binomial data. Binomial data is data with only two possible outcomes, like a coin flip. For us we might have outcomes like success in a class. In HCI, we might be curious which of multiple interfaces allows users to succeeded a task with greater frequency. Notice there that, success and failure are binary, and that’s what makes this binomial data. What can be tricky here, is that our data actually looks continuous, it looks just like straightforward continuous ratio data. Here we might say online students succeed 94.9 percent of the time, in traditional students succeed 92.1 percent of the time, and we might be tempted just to do a straightforward t-test on that. But if you try to do the math, you’ll quickly find that it doesn’t work. A t-test requires a standard deviation, and if every single student is either a one or a zero, a success or a failure, then you don’t really have a standard deviation in the same way. So instead, we have a specific tests that we use for binomial data called a binomial test. With a two sample binomial test, we compare two different sets of trials, each with a certain number of successes. So, we can answer questions like; does one lead to a greater ratio of successes than the other? Alternatively, we can also do a one-sample binomial test. That’s where we compare only one set of trials to some arbitrary number. So, for example. If we wanted to prove that a coin was unbalanced, we would use a one-sample binomial test comparing it to a ratio of 50 percent. You’ll know that you want to use binomial data, if the individual observations you’re getting out of users are binary. If you’re only concerned with whether they succeeded or failed on a particular task and if your data is just a bunch of instances of successes and failures, then you’re using binomial data. If the data you’re getting out of your users is more complex like multiple categories or continuous observations, then you’re probably looking at using a chi-squared test or a t-test or any of the ones we talked about before. Now, we’ve gone through a lot of tests and we’ve gone through them very quickly, but remember our goal is just for you to know what test to use and when. Once you’ve identified the appropriate test, looking at how to do it and actually putting the data in, is usually a much simpler task.
Because the only goal here is for you to be able to identify what test to use and when, we’ve put together this little chart to serve as kind of a cheat sheet. Here, you can look up your independent variable, your dependent variable and your number of treatments and see what test is recommended. So, if you’ve gathered some categorical ordinal data, with three or more treatments, I’d probably want to use a Chi-squared test. Now of course, you’ll also notice that this isn’t exhaustive. There are a lot of combinations of independent and dependent variables you could have that aren’t covered here. For example, we never talked about how you do a binomial test with three or more treatments. You might also have tests for the independent variables interval or ratio but your deep in the variable is still nominal or ordinal or something else. So, there are a lot of things out there in the statistical community that aren’t covered here. But these tests and these combinations of variables, tend to be the ones that we encounter most often in HCI. Even there, this is probably overkill. A Chi-squared test, a T-test and the occasional ANOVA will probably last you your entire career. At least for the types of analyses that we usually care about.
Here are five quick tips for doing empirical evaluations. You can actually take entire classes on doing empirical evaluations, but these tips should get you started. Number one, control what you can, document what you can’t. Try to make your treatments as identical as possible. However, if there are systematic differences between them, document and report that. Number two, limit your variables. It can be tempting to try to vary lots of different things and monitor lots of other things, but that just leads to noisy difficult data that will probably generate some false conclusions. Instead, focus on varying only one or two things and monitor only a handful of things in response. There’s nothing at all wrong with only modifying one variable and only monitoring one variable. Number three, work backwards in designing your experiment. A counter state that I’ve seen is just to gather a bunch of data and figure out how to analyze it later. That’s messy, and it doesn’t lead to very reliable conclusions. Decide at the start what question you want to answer, then decide the analysis you need to use, and then decide the data that you need to gather. Number four, script your analyses in advance. Ronald Coase once said, “If you torture the data long enough, nature will always confess.” What the quote means is If we analyze and reanalyze data enough times, we can always find conclusions, but that doesn’t mean that they’re actually there. So, decide in advance what analysis you’ll do and do it. If it doesn’t give you the results that you want, don’t just keep reanalyzing that same data until it does. Number five, pay attention to power. Power refers to the size of the difference that a test can detect. Generally, it’s very dependent on how many participants you have. If you want to detect only a small effect, then you’ll need a lot of participants. If you only care about detecting a big effect, you can usually get by with fewer.
Predictive evaluation is evaluation we can do without actual users. Now, in user centered design that’s not ideal, but predictive evaluation can be more efficient and accessible than actual user evaluation. So it’s all right to use it as part of a rapid feedback process. It lets keep the user in mind, even we we’re not bringing users into the conversation. The important thing is to make sure we’re using it appropriately. Predictive evaluation shouldn’t be used where we could be doing qualitative or empirical evaluation. It should only be used where we wouldn’t otherwise be doing any evaluation. Effectively, it’s better than nothing.
When we talk about design principles, we talked about several heuristics and guidelines we use in designing interfaces. The first method for predictive evaluation is simply to hand our interface and these guidelines to a few experts to evaluate. This is called heuristic evaluation. Each individual evaluator inspects the interface alone, and identifies places where the interface violates some heuristic. We might sit with an expert while they perform the evaluation or they might generate our report. Heuristics are useful because they give us small snapshots into the way people might think about our interfaces. If we take these heuristics to an extreme though, we could go so far as to develop models of the way people think about our interfaces. During our need-finding exercises, we developed models of our users tasks. In model-based evaluation, we take these models and trace through it in the context of the interface that we designed. So, let’s use a Gomez model for example, just we computed a Gomez model for what users data in some context, we can also compute a Gomez model for what they will do in our new interface. Then, we can compare these models side-by-side to see how our interface changes the task and evaluate whether a deficiency. So here, the classical way of disabling an alarm was to use a keypad mountain near the door. We could use this Gomez model to evaluate whether or not the new keychain interface was actually more efficient than the keypad interface. We can also use the profiles of users that we developed to evaluate whether the new design meets each criteria. For example, imagine if we identified this model as applying to users with low motivation to use this interface? Maybe it’s people doing purchases that they have to do for work, as opposed to just shopping at their leisure. We can use that to inform our evaluation of whether or not the interface relies on high user motivation. If we find that the interface requires users to be more personally driven or to keep more in working memory, then we might find that the users will fail if they don’t have high motivation to use the interface, and then we can revise it accordingly. If we take model-based evaluation to an extreme though, we can actually get to the point of simulation-based evaluation. At that point, we might construct an artificially intelligent agent that interacts with our interface in the way that a human would. Melody Ivory and Marti Hearst actually did some research on this back in 2001, on The State of the Art In Automating Usability Evaluation of User Interfaces. That seems like an amazing undertaking given how flexible and varied user interfaces can actually be. Can we really evaluate them automatically? More recently, work has even been done to create even more human-like models of users, like some work done by the Human-Centered Design Group at the Institute for Information Technology in Germany. Developing that agent is an enormous task on it’s own, but if we’re working on a big long-term project like Facebook or in a high-stakes environment like air traffic control, having a simulation of a human that we can run hundreds of thousands of times on different interface prototypes would be extremely useful.
The most common type of predictive evaluation you’ll encounter is most likely the cognitive walkthrough. In a cognitive walkthrough, we stepped through the process of interacting with an interface, mentally simulating in each stage what the user is seeing and thinking and doing. To do this, we start by constructing specific tasks that can be completed within our prototype. So I’m going to try this with the card prototype that I used with Morgan earlier. I start with some goal in mind. So right now, my goal might be to leave a note while listening. I look at the interface and I try to imagine myself as a novice user. Will they know what to do? Well, here’s a button that says View and Take Notes, and if they want to take a note, I think that’s reasonable to assume that they would know to do that. So I tap that button and what response do I get? Well, that’s when this screen will come up. The system will pause playback first and then it will show me the note-taking screen, and I go through the entire system like this, predicting what actions the user will take and noting the response of the system will give. At every stage of the process, I want to investigate this from the perspective of the gulfs of execution and evaluation. Is it reasonable to expect the user to cross that gulf of execution? Is the right action sufficiently obvious? Is a response to the action, the one the user would expect? Now on the other side, is it reasonable to expect the feedback to cross the gulf of evaluation? Does the feedback show the user what happened? Does the feedback confirm the user chose the right action? Now, the weakness of cognitive walkthroughs is that we’re the designers, so it likely seems to us like the design is just fine. After all, that’s why we designed it that way. But if you can sufficiently put yourself in the user’s shoes, you can start to uncover some really useful takeaways. So, here for example, from this cognitive walkthrough, I’ve noticed that there isn’t sufficient feedback when the user has finished leaving a note. The system just stops recording and resumes playback which doesn’t confirm that the note is received, and right now that might be a minor issue since there’s implicit feedback. The only way that playback resumes is if the note is received, but I’m also now realizing that it’s quite likely that users might start to leave notes and then decide to cancel them, so they both need a cancel option, and they need feedback to indicate whether the note was completed and saved or canceled. I got all that feedback just out of this cognitive walkthrough of the interface as is. So if you can put yourself in the novice’s shoes well enough, you can find some really good feedback without the difficulties of involving real users.
When we discussed the prototypes for our design for an audio book tool for exercisers, we briefly showed the evaluation stage with Morgan actually using them. Let’s look at that in a little bit more depth though. What were we evaluating? At any stage of the process, we could have been performing qualitative evaluation. We asked Morgan how easy or hard things were to do, how much he enjoyed using the interface, and what her thought process was in interacting with certain prototypes. We could have also performed some quantitative analyses. When she used the card-based prototype for example, we could have measured the amount of time it took her to decide what to do, or counted the number of errors she committed. We could do the same kind of thing with the Wizard of Oz Prototype as well. We could call to Morgan commands like press play and place a bookmark, and see how long it takes her to execute the commands, or how many errors she commits along the way. Between opportunities to work with Morgan though, we might also use some predictive evaluation to ensure that we keep her in mind while we’re designing. Our goal is to apply multiple evaluation techniques to constantly center our designs around the user. That’s why evaluation is a foundation of user-centered design. Just like we wanted to understand the user and the task before beginning to design, we also want to understand how the user relates to the design at every iteration of the design life-cycle.
In this lesson, we’ve covered three different types of evaluation. Qualitative, empirical, and predictive. Each method has its advantages and disadvantages. Let’s start to wrap this lesson up by exploring those advantages with an exercise. Here are the methods that we’ve covered. And here are some potential advantages. For each row, mark the column to which that advantage applies. Note that again, these might be somewhat relative, so your answer will probably differ a bit from ours. You can go ahead and skip to the exercise if you don’t want to hear me read these. Our advantages are, does not require any actual users, identifies provable advantages. Informs ongoing design decisions, investigates the participants thought process. Provides generalizable conclusions, and draws conclusions from actual participants.
Here would be my answers to this exercise. These are a little bit more objective than some of our exercises in the past. First, if it does not require any actual users, predictive evaluation is the only evaluation we can do without involving users in the evaluation process. That’s both its biggest strength and its biggest weakness. For identifying provable advantages, only empirical evaluation can reliably generate generalizable conclusions, generalizable advantages, because it’s the only one who does it numerically. As far as informing ongoing design decisions is concerned, that’s definitely the case for qualitative and predictive evaluation. I’ve left it unmarked for empirical evaluation simply because we usually do this towards the end of our design life cycle, although we also know that the design life cycle never really ends. So eventually, empirical evaluation could be used to inform ongoing design decisions. It’s just not involved in the earlier cycles though the design life cycle. As far as investing the participant’s thought process, again, empirical evaluation doesn’t really do that. It only accesses participants performance numerically. Qualitative evaluation definitely does this, because it actually asks users to think out lout and describe their thought process. And really, predictive evaluation tries to investigate the participant’s thought process, just in a lower overhead, or lower cost kind of way. It does so by having experts in usability design simulate the participant’s thought process, and comment on it from the perspective of some preset heuristics. Similar to how only empirical evaluation can identify provable advantages, it’s also the only one that can provide generalizable conclusions, again because it uses numbers. And finally, qualitative and empirical evaluations both draw conclusions from actual participants. This is the inverse of predictive evaluations, lack of requirement for actual users.
To succeed in HCI, you need a good evaluation plan. In industries like healthcare and education, that is initially going to involve getting some time with experts outside the real contexts of the task. That’s bringing doctors, bringing in nurses, bringing in patients, and exploring their thoughts and the prototypes that you’ve designed. In some places like education, you might be able to evaluate with real users even before the interface is ready. But in others like healthcare, the sticks are high enough that you’ll only want real users using the interface when you’re certain of its effectiveness and reliability. In some emerging areas that we find in multiple questions in evaluation, take on virtual reality for example, most people you encounter haven’t used virtual reality before. There’s going to be a learning curve. How are you going to determine whether the learning curve is acceptable or not? If the user runs into difficulties, how can you tell if those come from your interface, or if they’re part of the fundamental VR learning experience? So, take a moment to brainstorm your evaluation approach for your chosen application area. What kinds of evaluations would you choose, and why?
In this lesson, we’ve discussed the basics of evaluation. Evaluation is a massive topic to cover though. You could take entire classes on evaluation. Heck, you could take entire classes only on specific types of evaluation. Our goal here has been to give you enough information to know what to look into further and when. We want you to understand when to use qualitative evaluation, when to use empirical evaluation, and when to use predictive evaluation. We want you to understand within those categories, what the different options are. That way, when you’re ready to begin evaluation, you know what you should look into doing.
The content we’ve covered so far was developed over the course of several decades of research in HCI in human factors, and it’s all still applicable today as well. At the same time, new technologies and new areas call for new principals and new workflows, and specifically, the advent of the Internet ushered in new methods for HCI. Many software developers now adopt an agile workflow which emphasizes earlier delivery, more continuous improvement, and rapid feedback cycles. For those of us here in HCI. That’s actually really exciting. We love feedback cycles. We love building them for our users, and we love engaging in them ourselves. It’s also a scary prospect, though. We’ve discussed long prototyping processes that move from paper to wireframes to live demos involving lots of users in slow qualitative methodologies, and those things are still very valuable. But, nowadays, sometimes we just want to build something really fast and get it in front of real users. So, in this lesson, we’ll talk about how we might use agile development methods to engage in quicker feedback cycles.
Where did these changes come from? We can think of them in terms of some of the costs associated with elements of the design lifecycle. Think back to before the age of the Internet, developing software was very expensive. It required a very specialized skill set. Software distribution was done the same way we sold coffee mugs or bananas, you’d go to the store and you’d physically buy the software. That distribution method was expensive as well. If you ship software that was hard to use, the cost of fixing it was enormous. You had to mail each individual person an update disk, and really the only way to get user feedback or even to find out if it was usable, was the same way you would do it before distribution, by having users come in for testing. All of this meant, there was an enormous need to get it right the first time. If you didn’t, it would be difficult to fix the actual software, difficult to get the fix to users, and difficult to find out that a fix was even needed. Shigeru Miyamoto, the creator of Nintendo’s best video game franchises, describe this in terms of video games by saying, “A delayed game is eventually good but a rushed game is forever bad”. The same applied to software. Fast-forward to now though, is that still true? Development isn’t cheap now but it is cheaper than it used to be. A single person can develop in a day what would have taken a team of people months to do, 20 years ago. Thanks to advances in hardware, programming languages, and the available libraries. You can look at all the imitators of popular games on either the Android or the iPhone app store, to quickly see how much development costs have come down. It’s certainly feasible to churn out a really quick imitator when something becomes popular. But more importantly, distribution for software is now essentially free and updating software is essentially free as well. Every day you can download new apps and have them update automatically in the background. If you release something that has a bug in it, you can fix it and roll out the fixed immediately. Miyamoto’s quote is no longer really accurate, because it is possible to fix games after they’re released. Tesla for example, regularly pushes software updates to its cars via the Internet. In the video game industry, day one patches that fixed glitches on the very first day of release, has pretty much become the standard. Perhaps most importantly, we can gather usage data from live users automatically and essentially for free as well. It isn’t just usage data, it’s product reviews, error reports, buzz on the Internet. Lots of feedback about our applications now comes naturally, without us having to spend any money to gather it. What all this means is, there is now more incentive to build something fast and get it to users to start getting real feedback, as early as possible. Now make no mistake, this isn’t justification to just throw out the entire design lifecycle. The majority of HCI design and research still goes through with a longer process. You need several iterations through the full design lifecycle for big websites, complex apps, anything involving designing hardware, anything involving a high-profile first impression and really anything involving anything, even somewhat high in stakes. But that said, there exists a new niche for rapid development. Maybe you came up with an idea for a simple Android game, in the time it would take you to go through this longer process, you could probably implement the game and get it in front of a real users, and get a lot more feedback. That’s what we’re discussing here. How do you take the principles we’ve covered so far and apply them to a rapid, agile development process?
Before I describe the current ideas behind when to go for an agile development process, let’s see what you think. Here are five possible applications we might develop. Which of these would lend itself to an agile development process? So, feel free to skip ahead to the quiz if you don’t want to listen to me, read them out. A camera interface for aiding MOOC recording. A tool for helping doctors visualize patient info in surgery. A smartwatch game to play in short five-minute sessions. A wearable device for mobile keyboard entry. A mobile app for aggregating newsfeeds across networks. A navigation app for the console of an electric car.
Here would be my answers. The two areas that I think are good candidates, for an agile development process are the two that use existing devices and don’t have high stakes associated with them. In both these cases, rolling out updates wouldn’t be terribly difficult, and we haven’t lost a whole lot by initially having a product that has some bugs in it. A camera interface for any MOOC recording would be a good candidate if the camera environment was easier to program for. The programming for a camera isn’t like programming for an App Store or for a desktop environment. I actually don’t even know how you go about it. So, for us, a camera interface for aiding MOOC recording probably wouldn’t be a great candidate because we don’t have access to that platform. Remember, our goal is to get products in front of real users as soon as possible. Now, of course, that all changes if we’re actually working for a camera company, and we do have access to that platform. The second one is more fundamental though, a tool for helping doctors visualize patient information in surgery. There are really high-stakes behind that. If you visualize something in a way that’s a little bit misleading, someone could die. So, you probably don’t want to take an agile development process for that. For a wearable device for mobile keyboard entry, wearable devices are expensive to produce. When you are actually producing the physical device, you want to be sure it’s going to work pretty well. Similarly, devices are easy to update the way software is. So, wearable devices is probably not a good candidate for an agile development process. Finally, a navigation app for the console of an electric car, I said isn’t a good candidate, although you might disagree. Personally, I would say that the stakes are high enough for navigation app, that you probably want to be pretty sure that you’re going to have a good product before you roll it out to users. It might take a wrong turn or end up in the wrong neighborhood or missing appointment based on some mistakes that we make. I would consider that sufficiently high stakes to avoid a faster development process. Plus, not all electric cars are like Tesla. Some of them actually have to have you bring the car to the factory or to the repair shop to get an update. So, the cost of rolling out updates can be more significant there as well.
So when should you consider using these more agile methodologies? Lots of software development theorists have explored this space. Boehm and Turner specifically suggest that agile development can only be used in certain circumstances. First, they say, it must be an environment with low criticality. By it’s nature, agile development means letting the users do some of the testing. So you don’t want to use it in environments where bugs or poor usability are going to lead to major repercussions. Healthcare or financial investing wouldn’t be great places for agile development, generally speaking. Although there have been efforts to create standards that would allow the methodology to apply, without compromising security and safety. But for things like smartphone games and social media apps, the criticality is sufficiently low. Second, it should really be a place where requirements change often. One of the benefits of an agile process is they allow teams to adjust quickly to changing expectations or needs. A thermostat, for example, doesn’t change it’s requirements very often. A site like Udacity though, is constantly adjusting to new student interests or student needs. Now these two components apply to the types of problems we’re working on. If we’re working on an interface that would lend itself to a more agile process, we also must set up the team to work well within an agile process. That means small teams that are comfortable with change. As opposed to large teams that thrive on order. So generally, agile processes can be good in some cases with the right people, but poor in many others.
In 2006, Stephanie Chamberlain, Helen Sharp, and Neil Maiden investigated the conflicts and opportunities of applying agile development to user-centered design. They found interestingly that the two actually had a signficant overlap. Both agile development and user-centered design emphasized iterative development processes building on feedback from previous rounds. That’s the entire design life cycle that we’ve talked about. That’s at the core of both agile development and user-centered design. Both methodologies also place a heavy emphasis on the user’s role in the development process. And both also emphasize the importance of team coherence. So it seems that agile methods and user-centered design agree on the most fundamental element, the importance of the user. By comparison, the conflicts are actually relatively light, at least in my opinion. User-centered design disagrees with agile development on the importance of documentation and the importance of doing research prior to the design work actually beginning. But, clearly, the methodologies have the same objectives. They just disagree on how to best achieve them. As a result, the authors advocate five principles for integrating user-centered design and agile development. Two of these were shared between the methodologies in the first place, high user involvement and close team collaboration. User-centered designs’ emphasis on prototyping and the design life cycle shows that by proposing that design is run a sprint ahead of developers to perform the research necessary for user-centered design. To facilitate this, strong project management is necessary.
One application of Agile development in HCI is the kind of new idea of live prototyping. Live prototyping is a bit of an oxymoron, and the fact that it’s an oxymoron speaks to how far along prototyping tools have come. We’ve gotten to the point in some areas of development where constructing actual working interfaces is just as easy as constructing prototypes. So here’s one example of this, it’s a tool we use at Udacity called Optimizely. It allows for drag and drop creation of real working webpages. The interface is very similar to many of the wire-frame tools out there, and yet this website is actually live. I can just click a button and this site goes public. Why bother constructing prototypes before constructing my final interface, when constructing the final interface is as easy as constructing prototypes? Of course, this only addresses one of the reasons we construct prototypes. We don’t just construct them because they’re usually easier, we also construct them to get feedback before we roll out a bad design to everyone. But when we get to the point of making small little tweaks or small revisions, or if we have a lot of experience with designing interfaces in the first place, this might not be a bad place to start. It’s especially true if the cost of failure is relatively low, and if the possible benefit of success is particularly high. I would argue that’s definitely the case for any kind of e-commerce site. The cost of failure is maybe losing a few sales but the possible benefit is gaining more sales for a much longer time period. I’m sure anyone would risk having fewer sales on one day for the possible reward of having more sales every subsequent day.
So in some contexts, it’s now no harder to construct an actual interface than it is to construct a prototype, so we might skip the prototyping phase altogether. However, prototypes also allowed us to gather feedback from users. Even though we can now easily construct an interface, we don’t want to immediately roll out a completely untested interface to everyone who visits our site. We might be able to fix it quickly, but we’re still eroding user trust in us and wasting our user’s time. That’s where the second facet of this comes in, AB testing. AB testing is the name given to rapid software testing between typically two alternatives, A and B. Statistically it’s not any different from T-tests. What makes AB testing unique is that we’re usually rapidly testing small changes with real users. We usually do it by rolling out the B version, the new version to only a small number of users, and ensuring that nothing goes terribly wrong, or there’s not a dramatic dip in performance. That way we can make sure a change is positive, or at least neutral, before rolling it out to everyone, but look where testing feedback coming in here. They’re coming automatically with the real users during normal usage of our tool. There’s no added cost to recruiting participants and the feedback is received instantly. So for a quick example, this is the overview page for one of Udacity’s programs and it provides a timeline the students should dedicate to the program in terms of number of hours. Is number of hours the best way to display this? I don’t know, we could find out. Instead of showing 420 hours maybe I say this as 20 hours per week. In this interface all I have to do is edit it and I immediately have a new version of this interface that I can try out. Now I can click Start Experiment and try this out. I could find out. Does phrasing this as ten hours per week, does it increase the number? Does it decrease the number? If it decreases it, I can very quickly roll this back. If it increases it, I can very quickly roll this out to everybody. I’m going through the same design life cycle. I understand that the need is for the user to know where the timeline is. I’ve got a design in mind, which is to show the timeline in number of hours per week. I prototype it. It just happens to be here that the prototype is live. And I immediately roll it out. I look at how users use it, I evaluate it, and I decide if I want to roll back that change, or roll it out to everybody. I can go through a microcosm, a very rapid iteration to really design life cycle by using live prototyping and AB testing.
Agile development techniques don’t replace the design life cycle, they just caffeinate it. We’re still doing need-finding but we’re probably just doing it a little bit more tacitly by reading user feedback or checking out interaction logs. We’re still brainstorming design alternatives, but we’re really just coming up with them in our head because we then immediately move them on to prototyping. Our prototypes are still just prototypes. They just happen to work. We’re still doing evaluation by rolling our changes out to only certain participants first to make sure the respons is good. The results of that evaluation then feed the same process over and over again. So, taking an agile approach to the design lifecycle really doesn’t change cycle itself, it just changes the rate at which we go through it and the types of prototypes and the types of evaluation that we actually do. Remember also, the Chamberlain sharpen maiden advocated still doing the initial need-finding step. Rarely, will we go from no interface at all to a working prototype quite as quickly as we go through revisions of those working prototypes. So, it’s useful to do an initial need-finding phase the way we normally would do it. Then, proceeding to a more agile revision process once we have a working prototype to actually tweak and modify.
Here are five quick tips for using HCI and Agile development together, especially for mitigating the risks to the user experience presented by this more Agile development process. Number one, start more traditional. Start with a more traditional need-finding and prototyping process and shift to more Agile development once you have something up and running. Jacob Nielsen describes this as doing some foundational user-research. Once you have something up and running, you have a way of probing the user experience further, but you need something solid to begin with, and that comes from the more traditional process. Number two, focus on small changes. Notice that when I was doing live prototyping in AB testing, I was making a small change to an existing interface, not building an entire new site from scratch. Number three, adopt a parallel track method. Agile development often uses short two-week sprints and development. Under that setup, have the HCI research one sprint ahead of the implementation. The HCI team can do two weeks sprints of need-finding prototyping and low-fidelity evaluation. Then, hand the results to the development team for their next sprint. Number four, be careful with consistency. One of our design principles was consistency both within our interfaces and across interface design as a whole. If your interface caters to frequent visitors or users, you’ll want to be conservative in how often you mess with their expectations. If you’re designing for something like a museum kiosk though, you can be more liberal in your frequent changes. Number five, nest your design cycles in Agile development, you go through many small design cycles rapidly, and each cycle gives you a tiny bit of new information. Take all that new information you gather, and use it in the context of a broader more traditional design cycle aimed at long-term substantive improvements, instead of small optimizations.
Does the area of HCI on which you chose to focus lend itself naturally to agile development? There are a lot of questions to ask in that area. Are you working in a high-stakes area like health care or autonomous vehicles? What’s the cost of failure? If it’s high, you might want to avoid agile development. After all, it’s built in large part around learning from the real failures of real users. If that’s a user unfairly failing to reach the next level of the game, that’s probably fine. If that’s a doctor entering the wrong dosage of a medication into a new interface, that’s not fine. You also need to think of development costs. Agile development relies on being able to get a product up and out the door quickly and change it frequently. If any part of your design is reliant on the hardware, then agile development presents challenges. It might be easy to roll out a software update to improve a car screen interface, but you can’t download a car to fix a hardware problem. Now, take a moment and think about whether agile development would be right for the area of application that you chose.
In this lesson, we’ve covered a small glimpse of how HCI can work in a more agile development environment. In many ways, they’re a nice match. Both emphasized feedback cycles, both emphasize getting user feedback, and both emphasize rapid changes. But while HCI traditionally has done these behind the scenes before reaching real users, Agile emphasizes doing these live. Now, it’s important to note I’ve only provided a narrow glimpse into what Agile development is all about. I’ve discussed how HCI matches with the theory and the goals of Agile development. But Agile is a more complex suite of workflows and stake holders. I really recommend reading more about it before you try to take an Agile approach to HCI, or before you try to integrate interaction design into an existing Agile team as you do though, I think you’ll notice that there can be a really nice compatibility between the two.
In this unit, we’ve discussed the HCI research methods that formed the design life cycle, an iterative process between need-finding, brainstorming design alternatives, prototyping, and evaluation with real users. We’ve also discussed the ethics behind this kind of research and how it applies to some more modern agile software development methodologies. Now, in this wrap-up lesson, we want to explore a couple of examples of the full design life cycle in action. We also want to tie into the design principles unit and explore how we can use the design principles and research methods in conjunction with one another.
Throughout this unit, we’ve used the running example of designing an audio book app that would let people who are exercising interact with books and all the ways you or I might while we’re sitting and reading. That means being able to leave bookmarks, take notes, and so on. We discussed two in our foundational needfinding, going to a park and observing people exercising. We’ve talked also about doing interviews and surveys to find out more targeted information about what people wanted and needed. Then based on that, we brainstormed a whole lot of alternatives. We thought about those alternatives in terms of different scenarios and personas to settle on those with the most potential. Then, we took those alternatives and we prototyped a few of them. Specifically, we constructed Wizard of Oz prototypes for voice and gesture interfaces, and paper prototypes for on-screen interfaces. Then, we put those in front of our users. Well, a user in my case, but you would use more users. We got some initial feedback. So, at the end of one iteration of the design life-cycle, we have three different low-fidelity prototypes, each with some feedback on how effectively they work. But as you can tell, we’re not done yet. We don’t have an app. So, what’s next? Next, we go through another iteration of the design lifecycle.
We take the results of our initial iteration through the design lifecycle and use the results to return to the need-finding process. Now, that’s not to say we need to redo everything from scratch. But our prototypes and evaluation have now increased our understanding of the problem. There are things we learn by prototyping and evaluating about the task itself. In this case, we could have learned that even for exercises with our hands free, gestures are still tough because they’re moving around so much. The evaluation process may have also given us new questions we want to ask users to understand the task better. So, for example, Maureen could have mentioned that she needed the ability to rewind. We might want to know how common a problem is that. So, in many ways, synthesizing our experiences with the evaluation is our next need-finding process. So, then, we move on to design alternative stage. Again, that doesn’t mean we start from scratch and come up with all new ideas. Here it means expanding on our current ideas, flushing them out of it, and brainstorming them in the context of those personas and scenarios that we used previously. We might also come up with whole new ideas here based on our first iteration, then more prototyping. At this point, we might discover that as we try to increase the fidelity of some of our prototypes, the technology or the resource is just aren’t quite there yet. For example, while the gesture interface might’ve been promising, in the Wizard of Oz prototype, we could’ve done for that, we don’t yet have the technology to recognize gestures that way on the go, or we might have found that the expense related to the prototype is unfeasible, or we could have realized that the prototype would require violating some of our other user needs. So, for example, we could do gesture recognition if we had users hold a physical device that could recognize certain gestures, but that might be too expensive to produce or it might conflict with our audiences need for a hands-free system. So, we move on with the prototypes that we can build, with the goal of getting to the feedback stage as quickly as possible. For our voice recognition, instead of trying to build a full voice recognition system, maybe we just build a system that can recognize very simplistic voice commands. Instead of recognizing words, maybe it just recognizes the number of utterances if that’s easier to build. For the screen, maybe we build a wireframe prototype that moves between different screens on a phone, but we don’t connect it to a real system. We still have someone run alongside the exerciser and play the book according to their commands. That way, we focus on usability instead of things like integration with audiobook apps or voice-to-text transcription, things that take a lot of work to get right and might end up unnecessary if we find that the prototype isn’t actually useful, and then we evaluate again. This time, we probably get a little bit more objective. We still want data on the qualitative user experience, but we also want data on things like how long does it take the user to perform the desired actions in the interface or what prevents them from working with the interface. Imagine that we found, for instance, that for many exercisers, they go through places that are too loud for voice commands to work, or we could have found, that the time it takes to pull out the interface and interact is just too distracting and limiting. That information is once again useful to our ongoing iteration. At the end of that process, we again have some higher fidelity prototypes, but no product yet. So, we go again.
At the end of the last iteration through the design cycle, we had two interface prototypes, each with significant weaknesses. Our voice command interface struggled in loud areas, where exercisers are often exercising. Our screen-based interface presented too high a gulf of execution. But notice how far we’ve come at this point. We now have a pretty complete and nuanced view of the task and our possible solutions. So, now let’s go through one more iteration to get something we can actually implement and deploy. Our needfinding has come along with the point of understanding the completely hands-free interfaces are more usable, but we also know that gesture-based interaction is still technologically not totally feasible, and voice space isn’t perfectly reliable. So, now we might come up with a new alternative, a hybrid system. The voice interaction and on-screen touch interaction aren’t incompatible with one another. So, our new alternative is to develop a system that supports both, allowing users to use voice commands most of the time and default to touch commands in situations where the voice commands won’t work. So, they always have full functionality, but usability is still maximized. So, we create a new prototype, basically merging our two prototypes from the previous iteration. There is still pretty low fidelity because we haven’t tested this particular combination yet, and the next stages of sophistication is going to be pretty expensive to develop. So, we want to make sure that it’s worth pursuing. Then, we evaluate that, and we find it’s good enough to go ahead and move forward with producing it.
So, that’s the end, right? We went through a few iterations of the design lifecycle, we got iteratively more high-fidelity and rigorous with our evaluation. Finally, we have a design we like. So, we implement it, we submit it to the App Store and we sit back while the money roles in, not exactly. Now instead of having a handful of users we bring in to use our interface, we have hundreds of users using it in ways we never expected, and now the cycle begins again. We have data we’re automatically collecting through either usage tracking or error logs. We have user reviews or the feedback that they submit. So we jumped back into need-finding using the data we now have available to us. We might find more subtle needs like the need for more control over rewinding and fast-forwarding. We might move on and prototype things like a command for back five seconds or back 15 seconds. We might uncover more novel new needs as well, we find that there might be a significant contingent of people using the interface while driving. It’s similar in that it’s another place where people’s hands and eyes are occupied, but it has its own unique needs as well like the ability to run alongside a navigation app. So, the process starts again. This time with live users data and in general, it never really ends. Nowadays, you very rarely see interfaces or apps or programs or websites that are intentionally put up once and never changed. That might happen because designers got busy or the company went out of business, but it’s rarely a one-off deployment by design. As the design evolves over time with real data, you’ll start to see some nested feedback cycles as well. Week to week small editions give way to month-to-month updates and year to year reinventions. In many ways, your interface becomes like a child. You watch it grow up, and take on a life of its own.
The design principles we describe in our other unit are deeply integrated throughout the design life cycle. They don’t supplant it. You won’t be making any great designs just by applying those principles, but they streamline things. In many ways, design principles capture the takeaways and the conclusions found by this design life cycle in the past, in ways that can be transferred to new tasks and new interfaces. In uncovering needs, many of our needs are driven by current understanding of user abilities. Task analysis allows us to describe those needs, those tasks in formal ways to equip the interface design process, and cognitive load lets us keep in mind how much users are asked to do at a time. Direct manipulation gives us a family of techniques that we want to emphasize in coming up with our design alternatives. Mental models provide us an understanding of how the design alternatives might mesh with the user’s understanding of the task. Distributed cognition gives us a view on interface design that lends itself to design at a larger level of granularity. Here we’re designing systems not just interfaces. Design principles in general give us some great rules of thumb to use when creating our initial prototypes and designs. Our understanding of representations ensures that the prototypes we create match with users mental models that we uncovered before. Visible interfaces help us remember that the interface should be the conduit between the user and the task, not the focus of attention itself. Then the vocabulary of feedback cycles, the gulf of execution and evaluation give us ways to evaluate and describe our evaluations of the interfaces that we design. The notion of politics and values in interfaces allow us to evaluate the interface not just in terms of it’s usable interactions, but in the types of society it creates or preserves. Those usability heuristics that we apply to our prototyping are also a way of evaluating our interface and mentally simulating what a user will be thinking while using our creations. These principles of HCI were all found through many years of going through the design lifecycle, creating different interfaces, and exploring and evaluating their impact. By leveraging those lessons, we can speed to usable interfaces much faster. But applying those lessons doesn’t remove the need to go talk to real users.
Over the past several lessons, you’ve been exploring how the design life cycle applies to the area of HCI that you chose to explore. Now that we’ve reached the end of the unit, take a moment and reflect on the life cycle that you’ve developed. How feasible would it be to actually execute, what would you need? What users do you need? How many? When do you need them? There are right answers here, of course. Ideally you’ll need users early and often. That’s what user-centered design is all about. In educational technology, that might mean having some teachers, and students, and parents that you can contact frequently. In computer-supported cooperative work, that might mean having a community you can visit often to see the new developments. In ubiquitous computing, that might mean going as far as having someone who specializes in low fidelity 3D prototypes to quickly spin up new ideas for testing. Now that you understand the various phases of the design life cycle, take a moment and reflect on how you use them iteratively and as a whole in your chosen area of HCI.
At a minimum, user-centered design advocates involving users throughout the process through surveys, and interviews, and evaluations, and more things that we’ve already talked about. However, user-centered design can be taken to even greater extremes through a number of approaches beyond what we’ve covered. One is called participatory design. In participatory design, all the stakeholders, including the users themselves, are involved as part of the design team. They aren’t just a source of data. They’re actually members working on the design team, working on the problem. That allows the user perspective to be pretty omnipresent throughout the design process. Now, of course, there’s still a danger there. Generally, we are not our user. But in participatory design, one of the designers is the user, but they’re not the only user. So, it’s a great way to get a user’s perspective, but we must also be careful not to overrepresent that one person’s view. A second approach is action research. Action research is a methodology that addresses an immediate problem and researches it by trying to simultaneously solve it. Data gathered on the success of the approaches and used to inform the understanding of the problem and future approaches. Most importantly, like participatory design, action research is undertaken by the actual users. For example, a teacher might engage in action research by trying a new activity in his classroom and reflecting on the results, or a manager might use action research by trying a new evaluation system with her employees and noting the changes. A third approach is design-based research. Design-based research is similar to action research but it can be done by outside practitioners as well. It’s especially common in learning sciences research. In design-based research, designers create interventions based on their current understanding of the theory and the problem, and they use the success of those interventions to improve our understanding of the theory or the problem. For example, if we believed a certain intersection had a lot of jaywalkers because the signs have poor visibility, we might interview people at the intersection for their thoughts, or we could create a solution that assumes that we’re correct, and then use it to evaluate whether or not we actually were correct. If we create a more clearly visible sign and it fixes the problem, then it suggests that our initial theory was correct. Now, in all of these approaches, notice iteration still plays a strong role. We never try out one design and stop. We run through the process, create a design, try it out, iterate, and improve on it. Interface design is never really done. It just gets better and better as time goes on while also adjusting to new trends and new technologies.
This wraps up our conversation on research methods and the Design Life Cycle. The purpose of this is to put a strong focus on user-centered design throughout the process. We want to start our designs by understanding user needs and then get user feedback throughout the design process. As we do, our understanding of the user and the task improves and our designs improve along with it. Even after we’ve released our designs, modern technology allows us to continue that feedback cycle continually improving our interfaces and further enhancing the user experience.