Category
Artificial Intelligence
Big Data
Retail

Marketing and Data Science in the Post-Tracking Age

We look at the alternative marketing frameworks rooted in data science that are posed to counteract Apple’s cross-domain tracking regulation.

In the wake of the Cambridge Analytica scandal, the volume of organized public complaint over third-party tracking has outgrown governmental complacence on the topic, with new measures set to all but eviscerate user tracking. Let's take a look at the alternative frameworks currently in development, how FAANG has been drawn into both sides of the debate regarding big data privacy, and what available user data may be left on the table in a post-tracking era.

User-Profiling Across Domains

Cross-domain tracking is the practice of collating information about a specific user from a variety of different websites, apps and interfaces, in order to build up a profile of the user that's more complete than any one site or app would be likely to yield.

Historically, this has occurred through the use of 'third-party' cookies — long-term hidden browser preferences that follow the user across a number of websites and report back interaction data from each domain to a central processing framework that also receives information from other sources.

The method is arguably too effective: headlines over the last ten years have decried the intrusiveness of ads that exploited gender issues1; disclosed pregnancies2; advertised funeral services to the just-bereaved3; and that apparently eavesdrop on users' conversations in order to serve them apposite ads4 (against denials from the software creators5), and even track their offline activity6.

The Fall of Cross-Domain Tracking

In June of 2020, Apple announced that from 'early spring' of 2021, iOS-installed apps would need to explicitly seek permission to track user activity across other apps and websites via its Identifier for Advertisers (IDFA) functionality7.

Until now, apps have enjoyed access to a far wider range of user activity data than is indicated in the new default iOS app-tracking permissions dialogue, and the change has led to an aggressive and sustained campaign of protest and resistance from Facebook8, among other industry leaders in online advertising.

In March 2020, the WebKit browser engine that powers Apple's Safari browser announced that it would finalize its long campaign against third-party cookies by blocking them outright9 stealing a march on Google's Chrome browser, which will not block third-party cookies until 'some time' in 202210.

Meanwhile, in February 2021 Firefox implemented 'Total Cookie Protection'11, preventing any level of cookies (not just third-party cookies) from being tracked across domains, following up on earlier changes in its network architecture designed to completely prevent cross-domain tracking by other methods12.

With its advertising business model threatened, Google has proposed an alternative system called FLoC (Federated Learning of Cohorts, see below)13, that aggregates users into demographic categories without targeting them directly.

TikTok, among many other data-hungry social media ecostructures, is also looking into subverting the iOS app-tracking blockade, though details are scarce14.

Besides urging demographically valuable iOS users to opt in to tracking15, and attempting to frame these innovations as an attack on small businesses16, it's unclear yet if Facebook will likewise respond with alternative tracking approaches.

Advertising giants fight back against Apple’s imminent cross-domain tracking restrictions.
Tweet

SKAdNetwork: Apple's Cross-Platform Ad Click Tracking

When Apple began to prohibit or limit the use of third-party cookies with Intelligent Tracking Prevention in 2019, WebKit, the upstream open-source browser for Apple's Safari, began to implement an alternative tracking method, Ad Click Attribution17, via the SKAdNetwork SDK.

Ad Click Attribution is effectively a 'firewall' for user data, wherein a limited amount of data reporting on the user is allowed, and semi-anonymized cross-tracking permitted so long as the nodes in the event path are sites that the user actually visits.

SKAdNetwork Architecture

A rudimentary conversion funnel can be delivered to advertisers, without specifying the user:

Ad Click Attribution

The above image visualizes the three processes of WebKit's Ad Click Attribution.  

First (top), the user's ad click sends general information to the store about the origin of the click, together with a campaign ID with a maximum ID of #64, so that complex ID-based tracking is theoretically impossible.

Secondly (middle), the browser uses a common campaign ID linked to the domain mapping to identify one of four 'conversion events':

  • The addition of an item to a shopping cart.
  • Subscription to a service.
  • Entering of shipping and payment information.
  • Purchase of an item. 

Finally (bottom of image), as the user converts, the browser schedules a POST request, timed to trigger anywhere between 24 and 48 hours later, that notifies the ad-bearing site that a conversion has occurred. This programmed delay prevents the originating site/s from identifying users that click on ads and then convert rapidly.

Limit on User Data for SKAdNetwork

The domains involved are subject to a 'privacy budget' (though this term was coined by Google's later adaptation, FLoC, as we'll see), which limits the number of data points about the user that can be passed in requests. Thus the advertiser may not have enough 'budget' left to request, for instance, both the user's geolocation and the time of day that the purchase was made.

There are many more caveats to this process than we have space to list here, but they include:

  • The information is sent in Private Browsing Mode, avoiding persistent storage, even where the user is not browsing in Private Mode.
  • No cookies, client certificates, or potentially semi-permanent data exchange are supported.
  • The initial ad click data is only stored for a week, so that no long-term monitoring or 're-identification' through campaign entropy is theoretically possible. 

Neither the website where the ad click event occurs nor the site where the conversion occurs have any knowledge of the interstitial storage or processes. Instead, the originating ad site receives conversion information as a kind of 'blind event', allowing some basic monitoring of the efficacy of ad campaigns.

Emerging Platforms for SKAdNetwork

Though Apple provides documentation for setting up Ad Click Attribution, it does not provide any platform equivalent to DoubleClick or existing simplified frameworks. It has been argued in marketing forums that small publishers are unlikely to have the development acumen to set up campaign workflows under this new regime18.

However, this does leave a potential gap in the market for service providers who wish to develop new platforms for omnichannel marketing to streamline the process affordably and at scale. A slowly growing number of providers offer dedicated support for the WebKit/Safari SKAdNetwork system, including Singular19 and Kochava20.

With Safari occupying approximately 20% of the general browser market21, and not quite 25% of mobile browser share22, heavy investment in Ad Click Attribution would currently need to be justified with campaigns that can effectively target the superior spending power of Apple users23.  

However, if SKAdNetwork ultimately receives better public support than FLoC over the next two years, this may strengthen its long-term viability as a platform for data science consulting.

Is SKAdNetwork Fit for Purpose?

Both Ad Click Attribution and App Tracking Transparency (ATT, which will require explicit user consent to enable or re-enable app-based tracking) are currently set to go live 'sometime in spring' of 2021, though the launch of iOS 14.5, which is able to fully activate these systems, has been confirmed for April 202124.

SKAdNetwork can be implemented in Facebook, and the social giant provides some information for businesses about integrating it into Facebook events management, with the caveat that a maximum of 63 events can be programmed per app25.

Uptake of SKAdNetwork

A report from mobile ad platform Moloco in March 2021 declared that the percentage of bid requests over the SKAdNetwork rose from 14.5% to 20% in the last two weeks of February 202126 (though this is likely due to mounting pressure over the vague 'spring' deadline). Anurag Agrawal, Moloco's VP of Product, also observed that in some countries bid requests tripled in a matter of days in that period, and that usage will likely rise as the launch nears.

SKAdNetwork's privacy budget is forcing developers to choose between salient data points for conversion funnels. This is a trickier decision to make than it appears, since not every app that supports the system is obliged to make all the possible SKAdNetwork signals available.

It seems likely that the entire summer and fall of 2021 will prove an arduous testing ground for the framework, and that clarification on practical usage will be added to the documentation eventually, based on user feedback.

Introduce vanguard technology into your data system
with our data engineers’ assistance.

Google's Privacy Sandbox and FLoC

With a 64% share of the browser market27 and the largest single share (31%) of the online advertising sector28, Google's response to the anti-tracking lobby will likely define the near-future of digital marketing.

To date, the measures Google has taken and plans to take, include:

  • The removal of third-party cookies (tracking cookies) in Chrome by 202229, at least a year behind competing browsers, which will break tracking, targeting and profiling systems that are currently dependent on third-party cookies.
  • The wide deployment of a 'Privacy Sandbox' in Chrome — an architecture released as 'Google Signals' in 2018, and which effectively offers even more cross-tracking functionality than third-party cookies, given the right circumstances, and transfers greater control of ad data infrastructure to Google's own platform.
  • The implementation of FLoC within the Privacy Sandbox architecture over the course of 2021, with trials currently taking place in countries not covered by the GDPR30

How FLoC Works

FLoC operates as an embedded client-side technology inside Chrome V89+. It has a more complicated architecture than SKAdNetwork, though it adheres to many of the same concepts31.

Chrome users are assigned 'сohort IDs' based on domain visits (and possibly other factors — see below), and are then included in a group (cohort) with other users with similar interests.

FloC Cohort Architecture

Interactions from a user's cohort ID will later be correlated to ad campaigns, and eventually reported to the ad-originating site. In theory, advertisers will be able to 'target' the interests of the cohort group, but not the specific profile of anyone within that group.

If the cohort group is too small (i.e., the defining topics that generated it are marginal), the users inside it risk exposure. Therefore, additional algorithms are run on the cohort to ensure a minimum size:

Minimum-size Cohorts in FloC

Since FLoC groups need to be a minimum size in order to semi-anonymize users, smaller groups will be combined with others in order to preserve an adequate 'crowd' to obscure any one user ID; but there is no guarantee that FLoC's final implementation will limit itself in this way.

Contextual feature generation occurs for each cohort ID via clustering, and affinity clustering eventually siphons matched cohorts into appropriate groups:

SimHash Clusters

Re-enabling Cross-Tracking Under FLoC

FLoC uses the SimHash algorithm32 to create cohorts, with an individual user's cohort ID initially generated based on their visits to FLoC-enabled domains.

However, the Electronic Frontier Foundation (EFF) has noted that there is nothing to stop Google widening these criteria to page content and other facets, which might add granular detail to a user's profile and group placement, presenting a wider 'attack surface' for companies seeking to circumvent anti-tracking33.

There is no documented finalized bit-length yet for a cohort ID, and if 16-bit IDs are used instead of 8-bit ones, this extra length will tend to make a user more identifiable as time passes. Since IDs are calculated weekly, this potentially gives advertising frameworks a week at a time to exploit a rapidly growing level of detail about a user.

Piecing Together Consistent User Profiles from Transient Cohort IDs

In April 2021 web security engineer John Wilander, one of the engineers behind Apple's Intelligent Tracking Prevention, commented on FLoC's GitHub issues thread that since multiple sites can monitor the cohort ID over time, a hash of observed cohorts will become increasingly unique, effectively re-enabling cross-tracking34.

The EFF has characterised FLoC tracking as a 'behavioral credit score', and contends that the semi-anonymous nature of the scheme does not prevent third-party companies from adding a user's available FLoC data to a more complete, long-term internal profile35. In this way it could become possible to quickly and systematically 're-acquire' a specific user via the limited and transient information that FLoC offers.

Potential Availability of FLoC Data

Before anticipating a new era of FLoC-based analytics, it's important to consider how available the technology is actually going to be, and the extent to which Google is likely to make any concessions against complaints that FLoC is invasive and little different from the regime it replaces.

A number of downstream Chromium-based browsers have already committed to actively blocking FLoC functionality, offering Chrome addicts an alternative to Google's brand of the open-source project. The Vivaldi browser has announced that it will modify the Chromium engine to remove FloC36, while Brave has also promised to block the technology in its releases37. It has also been reported that Microsoft has little interest in implementing FLoC technology in its Chromium-based Edge browser38.

FLoC requires client-side code and cannot be implemented in Firefox or in any other browser that does not use the Chromium engine, including Safari — which has, perhaps, the least motivation to support FLoC.

A number of downstream Chromium-based browsers have already committed to actively blocking Google’s FLoC.
Tweet

Will Popular Sites Support FLoC?

Whether FLoC prospers or falls may depend as much on industry adoption as consumer acceptance. If the tech community fails to convince the average Chrome user of the evils of FLoC, and ad-supported media sees FLoC as the only route out of an existential crisis, the current furor may not achieve its goals.

According to an HTML code search in NerdyData, as of mid-April 2021, only six .com websites in the US are disabling FLoC — and two of those are DuckDuckGo and Brave, each with a stake in the issue39. In the same period, the PublicWWW source code search engine identifies just 20 web pages with any domain suffix (out of 522 million indexed pages) that actively block FLoC.

Obviously, the media is likely awaiting some kind of public consensus around FLoC as the trial progresses into the summer, and in the meantime can enjoy a neutral stance.

At this time, the architecture of FLoC is subject to change as trial results emerge, and headlines continue to harangue the technology. Therefore, FLoC lacks the same platform support that is emerging for SKAdNetwork, not least because all of Apple's privacy initiatives are being finalized and implemented in the spring and summer of 2021, whereas Chrome will not be disabling third-party cookies until 2022.

No Road Back to The 'Golden Age' of Cross-Tracking

Though rumors of various solutions to Apple's pro-privacy initiatives have circulated over the last year, there are no obvious 'drop-in' technologies to replace the full capabilities of cross-domain tracking. In the meantime, the future of FLoC, currently a volatile and under-documented technology, is in the balance.

Even if cross-tracking could be re-established, scandals of recent years have set the public will hard against it: the re-appearance of the 'psychic' ad campaigns that characterised the Cambridge Analytica years would simply reveal a new 'zero day' cross-tracking architecture, which would then come under popular attack in almost the same way as a computer virus.

If the consumer climate is so hostile to targeted ads that an advertiser cannot leverage detailed user data to create them without revealing their hand, there is less benefit in having the information in the first place.

If the consumer climate is so hostile to targeted ads that an advertiser cannot leverage detailed user data to create them without revealing their hand, there is less benefit in having the information in the first place.
Tweet

A Level Playing Field?

This leaves marketing companies with SKAdNetwork in the immediate future, and whatever FLoC transitions into in the longer term. Though Apple's system offers fewer users, those users can be more valuable, and the system itself is nearer usable deployment than FLoC.

If nothing else, SKAdNetwork may become a proving ground in 2021-22 for the central concept of cohorts-based advertising, generating insights that could inform the development of FLoC, which has already drawn heavily on Apple's initiative.

Besides this downgraded version of cross-tracking, marketing companies may revive the long-abandoned practice of contextual advertising, where ad categories are defined not by the user but by the content that they're consuming. Likewise for demographic advertising, long relegated to second position during the cross-tracking years, but a method nonetheless that has produced results for centuries.

Finally, the revived importance of first-party information may lead to the re-emergence of domains and consumer environments where the user, paid or free, must log in via local systems of authentication, instead of OAuth, Facebook, and Google tokens, since those methods co-opt a great deal of the user's data for themselves. However, this would likely lead to a resurgence of data breaches, as the domains would need to implement their own security frameworks once again.

Content type
Blog
Consult our data scientists
to address your pressing data challenges.

WANT TO START A PROJECT?

It’s simple!

Attach file
Up to 5 attachments. File must be less than 5 MB.
By submitting this form I give my consent for Iflexion to process my personal data pursuant to Iflexion Privacy and Cookies Policy.