Okay, but if the myth was false then and the behavior of the legit user interaction with the voice assistant versus idle is different... why would they wait until they have on-device processing to implement it that way? Why would they implement expensive server recognition for intended use but sneak in on-device processing JUST for advertising purposes they are already mining you for well within the EULA's terms? That and you'd definitely see it in battery consumption, if not in data throughput. NPUs/TPUs are hungry bois, so it wouldn't be a particularly smart workaround for quiet detection.
It's not that I think they wouldn't spy on your conversations, it's that I think it'd be bad business to do it that way.
This is always shocking to me. I mean, the researchers in this example are out there going "no, seriously, these third party apps are taking screenshots of your phone whenever you give screenshot permissions and sometimes sending video of what you do and they track you to the smallest detail and it's messed up" and everybody brushes that off and goes BUT SIRI IS LISTENING THO!!! and you just can't convince them to care about the real bad thing or to stop caring about the probably false less bad thing.
It's very confusing to me.