Exposing biases, moods, personalities, and abstract concepts hidden in large language models
Exposing biases, moods, personalities, and abstract concepts hidden in large language models
news.mit.edu
Exposing biases, moods, personalities, and abstract concepts hidden in large language models
A new method can test whether a large language model contains hidden biases, personalities, moods, or other abstract concepts. MIT researchers can zero in on connections within a model that encode for...
