How many cookies do top EU domains set? Are they in compliance with PECR Section 6?
As of May of this year, the "EU Cookie Law" refered to here as PECR Section 6, requires sites to gain consent to set UA cookies, localStoage (HTML5) or localSharedObject (AS2+) objects. My initial research on the landscape yielded some very wide results.
Click for Interactive Map |
Large publishers and sophisticated marketers who target content and sponsorship to behavior will struggle to measure an anonymous audience. Privacy sensitive PR messaging and required opt-ins may appear friendly but consume page real-estate (and are kind of annoying).
Implied Consent:
Required Opt-In:
Putting aside the question of how to track uses who have opted out of tracking for now. I wanted to know how EU sites are implementing this regulation. Clicking around the web with Developer Tools enabled allowed me to view cookies being set and unsurprisingly, top global publishers are setting cookies -- a lot of them.
Additionally, JavaScript utilities to "like" and "tweet" page content are obviously taking advantage of their page inclusion. The number of 3rd party cookies being set is interesting outside of any PECR implications particularly as a digital marketer.
How can we form a POV on cookie implementations for EU audiences when the regulation is written with some grey area and the interpretations vary?
Approach & Solution
Manual auditing of popular sites to see the volume of cookies set was effective and will yield enough data to plug numbers into a PPT by the end of the day. After personally re-auditing for 3rd party cookies, and then again for cookies that were set to expire before/after 30 days, I realized I need a way to scale the audit process and create a view of cookies set by popular sites in general. Using Alexa definitions of top 500 sites by country, helped get there but a manual audit was out of the question.
Loading URLs in Python and reading cookies with cookielib was quick, but as pages aren't actually rendered when cURLed, not all cookies were being set. Fortunately, PhantomJS provides the ability to spin up a headless browser in just a few lines of code; scripting this into an automation queue to collect cookies being sent for a range of URLs results in this script which will hit URL and return session, server and client-side cookies.
Screen-scraping alexa.com with Beautiful Soup was great, although collecting cookies from thousands of sites is pretty time consuming.
Resulting data can be visualized: top EU domains and number of cookies set.