On the Importance of Environments in Human-Robot Coordination

We demonstrate the effect of the environment on human-robot coordination and introduce a framework for finding environments that elicit different coordination behaviors.

Estimated reading time: 10 min

In human-robot interaction (HRI), there has been much work on modeling human states and actions and then integrating such models into robot decision making. Researchers assess these models based on how well the human-robot team coordinates. However, few prior works have explored the effect of the environment on this coordination. Our thesis is that changing the environment can result in significant differences between coordination behaviors.

Consider the following example of a human and a robot coordinating in the video game Overcooked, where a human-robot team prepares soups by gathering onions, cooking the onions into a soup on a stove, collecting the soup in a bowl, and finally delivering the soup to a counter (grey tile). In the video below, a robot (in green) with a QMDP Shared autonomy via hindsight optimization [link]
Javdani, S., Srinivasa, S., Bagnell, J., 2015. Robotics science and systems: online proceedings.
policy collaborates with a rule-based simulated human (in blue) to prepare two soups as quickly as possible.

A human (in blue) and robot with a QMDP policy (in green) collaborate to serve two soups. In this environment, the human and robot divide the work equally.

Observe the above environment layout during the coordination. The broadly open workspace allows both the human and robot to easily move around and, as such, the team can distribute workload equally between the human and robot. Next, contrast the layout above with the layout below, where the human does all the work. Keep in mind that the human and the robot follow identical policies in both scenarios — only the environment changes.

A human and robot collaborate to serve two soups in a different environment, but this time, the human does all of the work.

Manually designing environments that induce specific coordination behaviors requires substantial human effort. Furthermore, as robotic systems increase in complexity, it becomes challenging to predict how these systems will act in different situations and even more challenging to design environments that elicit a diverse range of behaviors that are possible in the real world before we deploy the robot in the real world. Therefore, we propose a framework for automatic environment generation, drawing upon insights from the field of procedural content generation in games.

Generating Solvable Environments with GAN+MIP

Starting with this section, we will construct our proposed framework piece by piece, beginning with generating environments via a generative adversarial network (GAN). In Overcooked, we define an environment as a grid of tiles, where we represent each chosen tile as a one-hot vector indicating one of eight tile types. Examples of tile types include the starting agent locations, floor tiles, countertops, and stoves. To train our GAN, we manually authored a set of environments based on the environments from the Overcooked video game Overcooked [link]
Ghost Town Games, 2016.
. Example environments from our human-authored training data are shown below.

Human-authored Overcooked environments.

Because we seek to create environments that are realistic and match the human authors’ style, we train a GAN to mimic the human-authored environments. Below we visualize our GAN architecture. The generator on the left-hand side takes as input a latent vector and outputs a one-hot encoded environment. Meanwhile, the discriminator on the right-hand side takes in the environment and outputs a score measuring the realism of a proposed environment.

GAN architecture for generating Overcooked
  environments.
GAN architecture for generating Overcooked environments.

To generate original environments, we feed latent vectors into our fully-trained GAN. The figure below contains sample environments generated by our GAN model.

Overcooked environments generated directly by a GAN. Each environment contains flaws that prevent any human-robot team from solving the environment. For instance, none of them have a robot (which would be a character with a green hat) and several of the environments contain key items not reachable by either player.

However, as shown above, many environments generated by the GAN are unsolvable. In particular:

  1. There may be too many of a given tile type (e.g. more than one human or robot).
  2. Key objects like the stove may be unreachable.
  3. The agents may be able to step outside of the environment (because there is no countertop at the edge).

To make the environments solvable, we adapt ideas from Video Game Level Repair via Mixed Integer Linear Programming [link]
Zhang*, H., Fontaine*, M., Hoover, A., Togelius, J., Dilkina, B., Nikolaidis, S., 2020. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.
and repair each environment via a mixed-integer linear program (MIP). The MIP modifies each environment to satisfy solvability constraints while minimizing the edit distance between GAN-generated environments and repaired environments. In other words, the MIP finds a new environment similar to the GAN-authored one that also satisfies all solvability constraints for the human-robot team. The images below show the environments before (top row) and after (bottom row) the MIP repair.


Top row: Overcooked environments generated by a GAN.
Bottom row: The same environments after the MIP repair. Each environment maintains similarity to its top row counterpart (i.e. the edit distance is small), but the repaired environment is now solvable by a human-robot team. For instance, each environment has both one robot and one human who can both reach the stove, onions, bowls, and counter.

Eliciting Diverse Behaviors with Latent Space Illumination

The GAN+MIP pipeline creates a generative space of environments guaranteed to be solvable by our human-robot team. We can sample the generative space of environments by passing latent vectors to our GAN+MIP pipeline to generate each new environment. However, recall that our goal is to discover multiple environments that are diverse with respect to the emergent coordination behaviors. For this reason, we explore the latent space Illuminating mario scenes in the latent space of a generative adversarial network [link]
Fontaine, M., Liu, R., Togelius, J., Hoover, A., Nikolaidis, S., 2021. In Proceedings of the AAAI Conference on Artificial Intelligence.
of our GAN+MIP pipeline directly with CMA‑ME Covariance matrix adaptation for the rapid illumination of behavior space [link]
Fontaine, M., Togelius, J., Nikolaidis, S., Hoover, A., 2020. Proceedings of the 2020 genetic and evolutionary computation conference.
, a state-of-the-art quality diversity Quality-Diversity Optimization: a novel branch of stochastic optimization [link]
Chatzilygeroudis, K., Cully, A., Vassiliades, V., Mouret, J., 2020. arXiv preprint arXiv:2012.04322.
algorithm.

CMA‑ME fills an archive of solutions that are diverse with respect to a set of behavioral characteristics (BCs), which are specified coordination behaviors measurable during simulation. For example, in the workload distribution experiment in the next section, we select three BCs related to how the workload is distributed: the differences in the numbers of 1) bowls, 2) ingredients, and 3) orders handled by the robot and human. Running CMA‑ME results in an archive where each cell contains an environment that elicits specific behavior from the human-robot team. The following figure visualizes our complete framework:

The framework for generating environments that elicit diverse behaviors from
human-robot teams.
The framework for generating environments that elicit diverse behaviors from human-robot teams.

In short, CMA‑ME samples latent vectors that we feed to our GAN to generate new candidate environments. Our MIP repairs each environment so it is solvable, and we simulate a human-robot team in each environment to determine the BCs (workload distribution of the three measured subtasks). In simulation, the robot executes a QMDP policy that reasons about the human’s next goal and we model the simulated human with a rule-based model. We assess the workload distribution of the team in each generated environment and update the archive. Finally, CMA‑ME observes how each new environment changes the archive of environments and updates its sampling distribution of latent vectors from this information. CMA‑ME adapts its sampling distribution of latent vectors to fill in the archive with diverse, high-quality environments.

Experimenting with Workload Distribution

In our paper we run several experiments demonstrating different use cases for our proposed framework. In this section we explore the workload distribution experiment, where our framework discovers different subtask distributions defined in previous sections. See the “Minimizing Performance” experiment for “Workload Distributions with Human-Aware Planning” from our paper. Previous work has shown that the perceived robot’s contribution to the team is a crucial metric of fluency Evaluating fluency in human–robot collaboration [link]
Hoffman, G., 2019. IEEE Transactions on Human-Machine Systems.
, and human-robot teaming experiments found that the degree to which participants were occupied affected their subjective assessment of the robot as a teammate Computational design of mixed-initiative human–robot teaming that considers human factors: situational awareness, workload, and workflow preferences [link]
Gombolay, M., Bair, A., Huang, C., Shah, J., 2017. The International journal of robotics research.
.

Below we visualize several environments which our framework discovered during the workload distribution experiment. CMA‑ME found environments 1-3, which have even workload distributions (the human and robot do the same amount of work). We note that the large open area in each environment provides the robot and human ample space to move between key object locations. Moreover, the robot and human each have their own “workspace” that facilitate working separately in parallel, making the task of delivering soups easier. Now contrast environments 1-3 with environments 4-6, which each have uneven workload distributions. These environments force the human to take all subtasks. Each one consists of a single “cramped” workspace of key items — bowl dispensers, onion dispensers, the stove, and serving station. The robot struggles to enter the workspace to help the human. We validate these results from a simulated human by running a user study (described in the next section), where the robot collaborates with a real human instead of our simulated human.

(1)
(1)
(2)
(2)
(3)
(3)
(4)
(4)
(5)
(5)
(6)
(6)
Environments found by our framework in the workload distribution experiments. Environments 1-3 have even workload distributions (i.e. human and robot do roughly the same amount of work) while environments 4-6 have uneven workload distributions (i.e. either the human or the robot does almost all of the work).

Evaluating Environments with Real Users

While our proposed framework finds environments that cause coordination problems between our simulated human and our QMDP robot policy, we still need to verify that discovered environments cause issues for real humans. To evaluate our generated environments, we ran an online user study where we asked participants to solve Overcooked environments with a robot collaborator. On environments with even and uneven workloads for simulated humans, we evaluated real human participants on their ability to coordinate with their robot partner. The results of the study confirmed that generated environments elicit similar collaboration behavior between a real human and the robot that were observed when our simulated human collaborated with the same robot. Below we show a sample interaction between a robot (always running the same QMDP policy) and a human user on environments discovered by our proposed framework.

(1)
(2)
(3)
(4)
(5)
(6)
Videos of a real user interacting with the robot (always with the QMDP policy) in the environments discovered by our framework. The interactions show that the human and robot share the work evenly in environments 1-3, while the human does nearly all the work in environments 4-6.

Conclusion

We presented a framework for generating environments that affect coordination behavior between a simulated human and a robot, even though the simulated human and robot do not change their interaction policies between environments. We then confirm, through an online user study, that real humans behave similarly to their simulated counterparts on each discovered environment. Our framework helps confirm our thesis that environments can have a large impact on coordination behavior in human-robot collaborative settings. We envision our framework as a method to help evaluate human-robot coordination in the future, as well as a reliable tool to help practitioners debug or tune their coordination algorithms. Finally, we hope our work will guide future human-robot coordination research to consider the environment as a significant factor in coordination problems.

This article is based on work that will be presented at RSS 2021. To learn more, read our paper on arXiv. For questions and comments, please visit our Github Discussions page.

Footnotes

  1. See the “Minimizing Performance” experiment for “Workload Distributions with Human-Aware Planning” from our paper.

References

  1. Shared autonomy via hindsight optimization [link]
    Javdani, S., Srinivasa, S., Bagnell, J., 2015. Robotics science and systems: online proceedings.
  2. Overcooked [link]
    Ghost Town Games, 2016.
  3. Video Game Level Repair via Mixed Integer Linear Programming [link]
    Zhang*, H., Fontaine*, M., Hoover, A., Togelius, J., Dilkina, B., Nikolaidis, S., 2020. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment.
  4. Illuminating mario scenes in the latent space of a generative adversarial network [link]
    Fontaine, M., Liu, R., Togelius, J., Hoover, A., Nikolaidis, S., 2021. In Proceedings of the AAAI Conference on Artificial Intelligence.
  5. Covariance matrix adaptation for the rapid illumination of behavior space [link]
    Fontaine, M., Togelius, J., Nikolaidis, S., Hoover, A., 2020. Proceedings of the 2020 genetic and evolutionary computation conference.
  6. Quality-Diversity Optimization: a novel branch of stochastic optimization [link]
    Chatzilygeroudis, K., Cully, A., Vassiliades, V., Mouret, J., 2020. arXiv preprint arXiv:2012.04322.
  7. Evaluating fluency in human–robot collaboration [link]
    Hoffman, G., 2019. IEEE Transactions on Human-Machine Systems.
  8. Computational design of mixed-initiative human–robot teaming that considers human factors: situational awareness, workload, and workflow preferences [link]
    Gombolay, M., Bair, A., Huang, C., Shah, J., 2017. The International journal of robotics research.