Wednesday, November 7, 2012

How Many Cookies do Top EU Domains Set

How many cookies do top EU domains set? Are they in compliance with PECR Section 6?

As of May of this year, the "EU Cookie Law" refered to here as PECR Section 6, requires sites to gain consent to set UA cookies, localStoage (HTML5) or localSharedObject (AS2+) objects. My initial research on the landscape yielded some very wide results.

Click for Interactive Map

Large publishers and sophisticated marketers who target content and sponsorship to behavior will  struggle to measure an anonymous audience. Privacy sensitive PR messaging and required opt-ins may appear friendly but consume page real-estate (and are kind of annoying).

Implied Consent:

Required Opt-In:

Putting aside the question of how to track uses who have opted out of tracking for now. I wanted to know how EU sites are implementing this regulation. Clicking around the web with Developer Tools enabled allowed me to view cookies being set and unsurprisingly, top global publishers are setting cookies -- a lot of them.

Additionally, JavaScript utilities to "like" and "tweet" page content are obviously taking advantage of their page inclusion. The number of 3rd party cookies being set is interesting outside of any PECR implications particularly as a digital marketer.

How can we form a POV on cookie implementations for EU audiences when the regulation is written with some grey area and the interpretations vary?

Approach & Solution

Manual auditing of popular sites to see the volume of cookies set was effective and will yield enough data to plug numbers into a PPT by the end of the day. After personally re-auditing for 3rd party cookies, and then again for cookies that were set to expire before/after 30 days, I realized I need a way to scale the audit process and create a view of cookies set by popular sites in general. Using Alexa definitions of top 500 sites by country, helped get there but a manual audit was out of the question.

Loading URLs in Python and reading cookies with cookielib was quick, but as pages aren't actually rendered when cURLed, not all cookies were being set. Fortunately, PhantomJS provides the ability to spin up a headless browser in just a few lines of code; scripting this into an automation queue to collect cookies being sent for a range of URLs results in this script which will hit URL and return session, server and client-side cookies.

Screen-scraping with Beautiful Soup was great, although collecting cookies from thousands of sites is pretty time consuming.

Resulting data can be visualized: top EU domains and number of cookies set.

Wednesday, June 13, 2012

Who's Practicing Responsive Design

Responsive design catalog has been circulating around for a while now. One of the best parts of the survey content they've curated there is the analysis -- which is misleadingly referred to as tag frequency -- outlining what technologies are being used to satisfy responsive design features.

This is a great catalog but hopefully one with a limited life span assuming that device agnostic web design will be a requirement for all digital publication eventually. Given that, knowing what type of sites are actually employing this technology as a trend is also a good measure of the maturity of the Responsive Design concept -- is it just web design professionals, or are eCommerce sites, publishers and other classifications of web properties seeing the value of this approach to web design? catalogs over 130 sites, which is a bit much to classify by hand. Also, assuming that the adoption of Responsive Design techniques will increase exponentially in the near future, managing the categorization of these sites manually is not really an option. To solve my immediate need for classification with a solution that will can scale with the amount of data (sites cataloged), I needed a way to classify each site dynamically.

The Analysis:

A quick and dirty script was developed to scrape the site. It pulled a list of URLs for each domain cataloged. This was very easy thanks to the clean markup on A bit of scrubbing to remove URLs to their sponsors (also easy as it's ad served, not hard-coded) and other non-content destinations such as Twitter.

A secondary script loads and parses the list of URLs and queries an entity extraction API to generate "sentiment tags" -- a computed summary of what a chunk of content is about.

Once we have a bag of words, any old tag-cloud can be used to provide an illustration of what responsive design web sites are actually about.

After testing a few different services including my old standby OpenCalais, I decided on a service called which claims to do more than create a frequency table based on word count, but doesn't seem to mention NLP anywhere. This service also takes a destination URL as an argument and saved me the trouble of scraping and organizing page content myself. Here's a pastebin of the API script.

There are several auto-tag APIs listed on Programmable Web and if I can find the time to test (and combine the output of) more of them, I might end up with a better product.

The Result:

As you can see from the tag cloud above, the Responsive Design sites cataloged my and processed by TagThe.Net seem to be about Responsive Design, or at least web design in general. It will be interesting to run this again in a month, and chart the difference.

Q: Who's practicing Responsive Design?

A: Web Designers

Friday, January 6, 2012

Banner Ad Analysis

Ad agencies make a lot of banner ads.


They design them, build prototypes and then release them to a localization/translation agency who modifies these master ads, and goes on to release them for consumption globally. These teams -- like Enfatico -- usually have a presence in local markets across the world tweak these banners to adjust revise before they are trafficked to publishers and appear out on the net.

Figuring out how banner ads are changed and what creative has actually run can be a challenge for a global organization. It seems simple enough, but consolidating digital media-plans requires a pretty significant degreee of organization which planners usually lack.

Where ads actually run and what other agencies might be slipping in and creating campaign specific messaging is impossible.

Fortunately, there is a great service called Moat that scrapes publishers, and categorizes banner ads by brand. From an analytic point of view, the upside is in the meta-data that defines the ads (not necessarily the creative). Moat's database contains lots of format information including size and file type, but exposes it only on an ad-by-ad basis, and doesn't do any roll-ups.

For a top-line view of formats, and sizes (specifically without manually rolling over every item in the resulting page), we use a custom tool called (obviously enough) MoatScrape to scrape the result set and create a roll-up the results.

Pretty handy until the fine fellas at Moat add this in as a feature...

Monday, May 23, 2011

A Confession: You Are What You Read

'They're poisoning you' from the May 68 era Atelier Populaire. The concept has roots in Antonio Gramesci's concept of a population under siege by mass media's definition of the status quo.

Largely embraced by the contemporary left, what is considered a healthy 'media cynicism' must include an analysis of what Galloway refers to as protocol or the actual (and entire) hardware/software apparatus on which the discourse is run.

Can digital channels can be considered truly "open" in regards to freedom of speech as opposed to freedom to profit? Where on the spectrum of Facebook to Gather to Encyclopedia Dramatica and 4Chan can one presume not to be data-mined into a commodity?

Even if we go one step down the protocol chain to the service providers (net, or cell) we have evidence of user commodification based on their location(s) alone (e.g., iPhone tracker) .

However I choose not to wear a tin-foil hat. As someone who works with the very tools that are used to create One Dimensional Men, I expect the same tools can enable a diversity and a tail so long that individuals can slip in and out of influenced "publics" and define a sense of freedom from mass-culture through the sheer complexity of selections available.

Thursday, March 24, 2011

Cross Channel Digital : Social Niche-works

Digital, cross-channel -- everyone is a publisher. The social web is fragmented into:
  • communities based around specific functions (sharing, connecting, research) and
  • types of relationships (segments within the professional, personal).

Channels that address relationships and functionality with specificity shouldn't be considered broadcast channels, but social niche-works.

While it's common to think that these channels are converging, the reality is that they are still in the early stages and each addresses a specific purpose. From this perspective the niche-works are actually more useful at addressing user segments when selected and integrated around a social (or marketing/communications) strategy. Integration is an essential part of the mix because what we have so cleanly defined as the "consideration" stage of the sales funnel has evolved into a resonant process/dialog which can be described as a Conversion Prism where consumers shed/gain brand loyalty.

Wide scope digital channels:

  • Twitter
  • Facebook
  • YouTube
  • SlideShare
  • Blogger/Tumblr/Wordpress etc
  • Flickr
  • Ning

Some to consider:

  • FourSquare/Scavenger
  • LinkedIn/
  • Quora
  • Craigs List
  • Delicious
  • Myspace (it'ts not dead, its for music)
  • SalesForce
  • Webinar (live/archived)

Emerging, social niche-works

  • Kuler
  • StackOverflow
  • Pandora/Spotify
  • urbandictionary

A challenge for measurement, but a pick-list for transmedia initiatives.

Sunday, March 20, 2011

First-Responders of Media

A new lens system Condition One for DSLRs which captures the human field of view and can be projected in convex dome aims to provide a more realistic viewing experience for war footage.

It makes sense, given the increasing realism demonstrated by games such as Call of Duty, that actual news footage needs to make more of an impact. The tipping point for this kind of technology will be when it arrives in the hands of people who are already on the ground in situations like Tunisia, Egypt, Libya and can shift the value of what we in the west (particularly those of us in the media business) often consider The Cult of the Amateur into a kind of First-Responders of the Media.

iPad experience is draggable as shown below.

My Freedom Or Death - Condition ONE Beta from Danfung Dennis on Vimeo.

Monday, March 7, 2011

Narrative thumbnails

Movies as bar codes - compressed into a single frame.

While this leans a bit too much towards the concept of "corporate dashboard" by distilling the resonant experience of film by into a colored graph, it is interesting that an overall color "tone" is present in nearly all of the examples. I'd love to see patterns emerge ala EVP or if the generated thumbnail could some how describe additional value (beyond compression/quantification).

This are a lot more aesthetically pleasing that what I'm doing with netGHOST (link forthcoming) but netGHOST doesn't quantify the resonant experience into such a terse glyph.