Facebook’s ‘Red Team’ Hacks Its Own AI Programs

User Review
0 (0 votes)

AI Programs- INSTAGRAM ENCOURAGES ITS billion or so users to add filters to their photos to make them more shareable. In February 2019, some Instagram users began editing their photos with a different audience in mind: Facebook’s automated porn filters.

Facebook depends heavily on moderation powered by artificial intelligence, and it says the tech is particularly good at spotting explicit content. But some users found they could sneak past Instagram’s filters by overlaying patterns such as grids or dots on rule-breaking displays of skin. That meant more work for Facebook’s human content reviewers.

Facebook’s AI engineers responded by training their system to recognize banned images with such patterns, but the fix was short-lived. Users “started adapting by going with different patterns,” says Manohar Paluri, who leads work on computer vision at Facebook. His team eventually tamed the problem of AI-evading nudity by adding another machine-learning system that checks for patterns such as grids on photos and tries to edit them out by emulating nearby pixels. The process doesn’t perfectly recreate the original, but it allows the porn classifier to do its work without getting tripped up.

That cat-and-mouse incident helped prompt Facebook a few months later to create an “AI red team” to better understand the vulnerabilities and blind spots of its AI systems. Other large companies and organizations, including Microsoft and government contractors, are assembling similar teams.

Those companies spent heavily in recent years to deploy AI systems for tasks such as understanding the content of images or text. Now some early adopters are asking how those systems can be fooled and how to protect them. “We went from ‘Huh? Is this stuff useful?’ to now it’s production-critical,” says Mike Schroepfer, Facebook’s chief technology officer. “‘If our automated system fails, or can be subverted at large scale, that’s a big problem.”

Read More Here

Article Credit: Wired