Principal Investigators: Edmund H. Durfee and Satinder Singh Baveja
Student Investigator: Rob Cohn
Project sponsor: TARDEC/TACOM
Fatigue and distraction are among the reasons against expecting human operators to supervise unmanned robotic vehicles continuously and closely, so as to ensure that their actions are safe to nearby people and coordinated with mission objectives. The overall goals of this project are to develop theoretical underpinnings and implemented prototypes for technologies that allow an autonomous vehicle to weigh mission-critical and safety-critical considerations, along with a model of the operator’s availability and preferences, in deciding for itself whether and how to seek operator input into real-time control and mission-adjustment decisions to perform reliably despite unexpected interactions with civilians and other military personnel.
In its first phase, this project developed a set of algorithms, based off of Markov Decision Process (MDP) models, that a semi-autonomous vehicle could use to decide when to act autonomously versus when to request help from the operator. We tested these algorithms in an abstract simulation framework of a vehicle moving through a grid-world in which it might encounter roving groups of pedestrians. We were able to show that, given the model of the operator’s relative preferences (for example, how important is getting to the destination promptly compared to the safety of the pedestrians) and probabilities of events occurring (for example, what is the likelihood of colliding with a pedestrian if the vehicle moves nearby), that our algorithm could formulate an optimal policy specifying when the vehicle should continue on its path, versus waiting for crowds to clear, versus seeking a new route, versus asking the operator for help. An example of a scenario is shown in Figure 1 (left), and of an optimal policy is shown in Figure 1 (right).
By the vehicle itself explicitly modeling the potential costs to the operator for intervening in vehicle operations (which might distract the operator from other critical tasks), the vehicle can adjust its degree of autonomy dynamically to the potential risks and rewards it currently faces. Our equations like the ones below compute the Expected Myopic Gain (EMG) of a possible question by employing formal measures of information gain. EMG provably asks optimal questions under some strict conditions, and we have shown that it performs well empirically in a wider space of conditions.
In these equations, the vehicle’s EMG of asking query q in current state sc given its current model μ of the operator’s knowledge is the probabilistically-weighted average over the gains of the possible answers. The gain for getting answer o to query q is the expected value of pursuing the optimal policy π*μ,o knowing o minus the expected value of following the policy π*μ it otherwise would have followed. Using EMG, the vehicle can compute how valuable each query it is considering asking is, and can compare the expected gain of the best query against its model of the costs of asking the operator, to decide whether to pose the query. We have devised, implemented, and empirically evaluated preliminary versions of EMG for asking questions about the operator’s goals and knowledge of world dynamics.