There are several major approaches currently being researched and implemented to help better align advanced artificial intelligence with human values and priorities. Ensuring AI systems do not act in ways that are harmful, deceptive, or contradict human ethics and preferences is an important open challenge in the field. Researchers are actively exploring technical methods, best practices for model development, and governance strategies to navigate this challenge responsibly as the capabilities of AI continue to grow.
On the technical side, a major approach is building robustness and transparency into machine learning models. Researchers are working to make models more interpretable so that the reasoning behind their decisions and outputs can be understood and evaluated by people. This includes developing techniques like model attribution and influence functions that help pinpoint how specific parts of the training data or model architecture are influencing predictions. Greater interpretability helps human overseers assess whether a system may be making unfair, biased, or otherwise undesirable inferences and intervene if needed.
Relatedly, researchers are exploring formal specification techniques that would allow important values, constraints, or priorities to be explicitly defined and then verified or enforced by the model’s architecture and training process. For example, specifying that a system trained for hiring should not use attributes like gender, age or ethnicity when evaluating candidates. Techniques from control theory and robust decision making are also being adapted to ensure AI goals and behaviors remain well-aligned even as environments or circumstances change in unforeseen ways.
On the process side, many organizations are establishing principles and best practices for the development of AI to help guide efforts towards beneficial outcomes. For example, researchers at places like Google, Microsoft, Amazon, DeepMind and other major AI labs have published reports outlining research priorities and development strategies aimed at creating socially beneficial AI. This includes calls for greater transparency into research activities and models, a focus on issues like fairness, privacy and the well-being of humans affected by AI applications, and the establishment of review boards to provide oversight.
Most major tech companies have also established internal review processes for new AI projects to evaluate potential societal impacts, uses of sensitive data, and how training objectives may inadvertently undermine important values if not carefully specified and monitored. Outside review from multi-stakeholder groups is also being utilized by some to provide additional perspectives on responsible development and deployment. Continued development of best practice frameworks that can help guide the work of many different organizations and research groups is still needed however.
On the governance side, there are ongoing discussions focused on how oversight and accountability for advanced AI could be structured as capabilities increase. This includes debates around the development of detailed regulatory frameworks, the establishment of new third-party review boards or certification processes for high-risk applications, as well as industry cooperation to set technical and process standards. While governance approaches are still emerging, most experts argue some combination of self-regulation by technology companies paired with reasonable public oversight will likely be needed. Continued multi-stakeholder conversations exploring these complex issues responsibly are also important.
Ensuring the continued priorities, values and preferences of humanity are properly reflected in advanced AI as capabilities grow remains a difficult open challenge without easy or definitive answers. Through technical progress, establishment of best practices, and prudent governance discussions, the field is actively working to develop solutions and navigation strategies. Ongoing and increased cooperation between researchers, companies, policymakers, and other stakeholders will be important to thoughtfully address this challenge in a way that helps maximize the benefits of AI while mitigating risks. The goal of many working in this area is to help guide development of technologies that are robustly beneficial to humanity.
Current major approaches to aligning AI with human values include developing interpretability and formal specification techniques within machine learning models, establishing principles and review processes for AI development within organizations, and exploring strategies for appropriate oversight and accountability governance structures. While considerable work remains, continued progress and multi-stakeholder collaboration are helping to thoughtfully advance this important challenge. Ensuring advanced AI capabilities are developed and applied responsibly and for the benefit of humanity will likely require ongoing cooperation across technical, process and policy dimensions.