Researchers have found a way to use the web’s basic functions to identify who visits a site – without the user detecting the hack.
Every advertiser, from government-backed hackers and spyware makers, wants to identify and track users across the web. And while a staggering amount of infrastructure is already in place to do exactly that, the appetite for data and new tools to collect it has proved insatiable. With that reality in mind, researchers from the New Jersey Institute of Technology are warning this week about a novel technique that attackers could use to de-anonymize website visitors and potentially connect the dots on many components of targets’ digital lives.
The findings, which NJIT researchers will present at the Usenix Security Symposium in Boston next month, show how an attacker who tricks someone into loading a malicious website can determine whether that visitor controls a particular public identifier, like an email address or social media account, thus linking the visitor to a piece of potentially personal data.
When you visit a website, the page can capture your IP address, but this doesn’t necessarily give the site owner enough information to individually identify you. Instead, the hack analyses subtle features of a potential target’s browser activity to determine whether they are logged into an account for an array of services, from YouTube and Dropbox to Twitter, Facebook, TikTok, and more. Plus the attacks work against every major browser, including the anonymity-focused Tor Browser.
“If you’re an average internet user, you may not think too much about your privacy when you visit a random website,” says Reza Curtmola, one of the study authors and a computer science professor at NJIT. But certain categories of internet users may be more significantly impacted by this, like people who organize and participate in political protests, journalists, and people who network with fellow members of their minority group. And what makes these types of attacks dangerous is that they’re very stealthy. “You just visit the website and you have no idea that you’ve been exposed.”
The risk that government-backed hackers and cyber-arms dealers will attempt to de-anonymize web users isn’t just theoretical. Researchers have documented several techniques used in the wild and have witnessed situations in which attackers identified individual users, though it wasn’t clear how.
Other theoretical work has looked at an attack similar to the one NJIT researchers developed, but much of this past investigation has focused on grabbing revealing data that’s leaked between websites when one service requests another. As a result of this prior work, browsers and website developers have improved how data is isolated and restricted when content loads, making these potential attack paths less feasible. However, knowing that attackers are motivated to seek out techniques for identifying users, the researchers wanted to explore additional approaches.
“Let’s say you have a forum for underground extremists or activists, and a law enforcement agency has covertly taken control of it,” Curtmola says. “They want to identify the users of this forum but can’t do this directly because the users use pseudonyms.” But let’s say that the agency was also able to gather a list of Facebook accounts that are suspected to be users of this forum. They would now be able to correlate whoever visits the forum with a specific Facebook identity. “
How this de-anonymization attack works is difficult to explain but relatively easy to grasp once you have the gist. Someone carrying out the attack needs a few things to get started: They have a website they control, a list of accounts tied to people they want to identify as having visited that site, and content posted to the platforms of the accounts on their target list that either allows the targeted accounts to view that content or blocks them from viewing it—the attack works both ways.
Next, the attacker embeds the aforementioned content on the malicious website. Then they wait to see who clicks. If anyone on the targeted list visits the site, the attackers will know who they are by analyzing which users can (or cannot) view the embedded content.
The attack takes advantage of several factors most people likely take for granted: Many major services—from YouTube to Dropbox—allow users to host media and embed it on a third-party website. Regular users typically have an account with these ubiquitous services and, crucially, they often stay logged into these platforms on their phones or computers. Finally, these services allow users to restrict access to content uploaded to them. For example, you can set your Dropbox account to privately share a video with one or a handful of other users. Or you can upload a video to Facebook publicly but block certain accounts from viewing it.
These “block” or “allow” relationships are the crux of how the researchers found that they can reveal identities. In the “Allow” version of the attack, for instance, hackers might quietly share a photo on Google Drive with a Gmail address of potential interest. Then they embed the photo on their malicious website and lure the target there. When visitors’ browsers attempt to load the photo via Google Drive, the attackers can accurately infer whether a visitor is allowed to access the content—aka, whether they have control of the email address in question.
Thanks to the major platforms’ existing privacy protections, the attacker can’t check directly whether the site visitor was able to load the content. But the NJIT researchers realized they could analyze accessible information about the target’s browser and the behavior of their processor as the request was happening to make an inference about whether the content request was allowed or denied.
The technique is known as a “side channel attack” because the researchers found that they could accurately and reliably make this determination by training machine learning algorithms to parse seemingly unrelated data about how the victim’s browser and device process the request. Once the attacker knows that the one user they allowed to view the content has done so (or that the one user they blocked has been blocked), they have de-anonymized the site visitor.
As complicated as it may sound, the researchers warn that once attackers have done their homework, it will be simple to carry out. It would only take a couple of seconds to potentially unmask each visitor to the malicious site—and it would be virtually impossible for an unsuspecting user to detect the hack. The researchers developed a browser extension that can thwart such attacks, and it is available for Chrome and Firefox. But they note that it may impact performance and isn’t available for all browsers.
Through a major disclosure process to numerous web services, browsers, and web standards bodies, the researchers say they have started a larger discussion about how to comprehensively address the issue. At the moment, Chrome and Firefox have not publicly released responses. And Curtmola says fundamental and likely infeasible changes to the way processors are designed would be needed to address the issue at the chip level. Still, he says that collaborative discussions through the World Wide Web Consortium or other forums could ultimately produce a broad solution.
“Vendors are trying to see if it’s worth the effort to resolve this,” he says. “They need to be convinced that it’s a serious enough issue to invest in fixing it.”