Ads.txt

IAB-logo.png

The IAB recently unveiled its new ads.txt project. The ads.txt project aims to solve the problem of SSPs or ad networks selling publisher inventory without the authorization to do so. This is important for publishers for a few reasons, including: 
1) it may help prevent fraud by not allowing fraudulent sources of traffic (e.g. methbot) from selling publisher inventory through unauthorized channels (meaning a fraudulent ad network couldn't set up a seat somewhere and sell nytimes.com and make money off their impressions);
2) it would avoid domain spoofing by nefarious sellers, where they pretend that certain ad inventory is different ad inventory on ad exchanges - meaning buyers think they're buying nytimes.com at impression time but are really buying some junky site; and
3) it reduces the arbitrage opportunities for publisher inventory, meaning that nytimes.com cannot be bought and resold over the exchanges - with the idea that reducing arbitrage means more public will deal directly with the publisher

These are all genuine problems in the ecosystem, and solving them definitely is a worthwhile goal. The ads.txt program aims to address them with a particularly simple solution. The idea is that a publisher would place a file at, for the nytimes, nytimes.com/ads.txt and it would include a specification of the authorized resellers of their inventory, and how they're authorized to sell it (to some degree). The specification for the ads.txt file is very simple. The publisher includes a list that looks something like:

triplelift.com, 12345, DIRECT, d75815a79
pubmatic.com, 23456, RESELLER, f496211
rubiconproject.com, 34567, DIRECT
rubiconproject.com, 45678, RESELLER
rubiconproject.com, 56789, RESELLER

In the example above, the publisher has authorized three different partners - TL, Pubmatic, and Rubicon, authorized to sell their inventory. The first item in each line is the domain of the advertising system. The second element is the publisher ID on that system. So on TL, the only eligible publisher ID would be 12345. In the example above, there are 3 eligible publisher IDs for rubicon. This likely means that there would be different networks or representations of this publisher's inventory - all of which would be permitted. The third element is the type of relationship - whether the publisher manages the sale of inventory on the platform (DIRECT) or it's resold on the publisher's behalf (RESELLER) - some buyers may choose to only buy direct. Finally, the fourth, optional element is an ID that uniquely identifies the advertising system within a certification authority - such as TAG. 

Through all this, the publisher would be saying that no one should buy their inventory on, for example, AppNexus (or anyone else not listed), because they're not authorized, and no one should buy their inventory if it comes through any listed adserving system but through an unauthorized publisher ID. This is all pretty straightforward, to a degree, but it misses some genuine complexity in the ecosystem that isn't going to disappear overnight.

Ads.txt is modeled after robots.txt. The latter file tells web crawlers from google, bing etc. which parts of a website they should and should not crawl. Robots.txt is directly related to the actual content of a website, and thus is designed in conjunction with the actual operation of that site - and can be made available by the same system that manages that content. Robots.txt is also relatively static. The ads.txt file, meanwhile, would be operated by an entirely different part of a publisher's team - the ad operations team. This group rarely has access to the actual content on a publisher's site, and instead tends to operate through DFP, TripleLift's console, etc - thus managing the ads.txt file could be a relatively challenging proposition since they often don't actually have access to the server itself. They would also need to continually update this file with every new partner they begin or stop working with - so someone would need to be familiar with every partner at any time for all sorts of inventory. Further, nowhere in the specification does it handle the nuances that actually are relevant to the complex operations of a publisher - different sorts of inventory (video, native, etc) are handled by different partners, different geos are handled differently, etc. Finally, entities like BidSwitch, AppNexus, PulsePoint and others are intermediaries that have a genuine place in the ecosystem - providing more access to a given piece of inventory. Buying through a many-to-many partner like these is not necessarily the sort of behavior that ads.txt is looking to stop, but it accidentally undermine the business models of these companies when they make inventory from other SSPs available as SSPs themselves - to DSPs that have not necessarily integrated with the originating SSP. 

The robots.txt was also designed to be downloaded by companies whose products were in the business of crawling the web. Adding an additional file to be crawled was a relatively trivial proposition. DSPs, meanwhile, are generally not in the business of crawling the web. For the ads.txt initiative to be successful, every DSP will need to independently build a crawler, then crawl every single domain from which they receive impressions. They will need to cache the results and build a system that can look, for every single bid request, if it matches the eligible set of publisher IDs in real-time. This is a substantial undertaking and, given that DSPs already have huge backlogs of products to develop, not necessarily one that will be immediately prioritized.

This, in turn, gets to one of the primary problems of ads.txt - to be successful it has to be successful. For a DSP to choose to implement it, it has to have some confidence that it will actually be impactful. For a publisher to choose to implement, it has to have some confidence that DSPs will respect it - and that it won't hurt its monetization. Neither side is frictionless, and neither side adds value without the other.