Information for Publishers

Just like any search engine, Attribyte operates agents that automatically visit and index pages, feeds and images. We try very hard to follow the robot exclusion protocol. You can control what, if any, content is accessed by Attribyte through the robots.txt file. Attribyte's agents are described below along with details about how they may be excluded.

User-Agent: Attribot-Feeds

This agent visits feeds at most once per hour. If excluded by robots.txt, entries will not appear on public pages, unless they are reviewed or annotated. Even if excluded from public pages, feeds will be checked by this agent; entries are displayed for private use of users of the Attribyte Console (a private feed-reader). The Attribyte Console allows our users to read and easily share your posts with others through Twitter, Facebook and other channels. If you don't want to allow this, you can explicitly exclude our agent. Feeds will not be checked at all, preventing their entries from being read (or promoted) by anyone using the Attribyte Console. The robots.txt file will continue be checked once per day for changes, even if Attribyte-Feeds is excluded.

If you want Attribyte to completely ignore your content, add the following to your robots.txt file:

User-Agent: Attribot-Feeds
Disallow: /
User-Agent: Attribot-Images

This agent downloads favicons and creates thumbnail images from images discovered in feed entries that meet specific criteria (dimensions) unless excluded by robots.txt. Small image thumbnails are saved for recent entries, then automatically deleted as the entries age.

The following example shows how to explicitly exclude Attribyte. This example assumes images are served from the same web-server as your content.

User-Agent: Attribot-Images
Disallow: /

Note that you don't have to explicitly exclude Attribyte; if you have wildcard agent entries, those will be respected.

User-Agent: Attribot-Pages

Extracts information from HTML pages.

The following example shows how to explicitly exclude Attribyte.

User-Agent: Attribot-Pages
Disallow: /

Note that you don't have to explicitly exclude Attribyte; if you have wildcard agent entries, those will be respected.

Bandwidth Minimization

Attribyte's software agents try to minimize consumed bandwidth through the following mechanisms:

  • All requests pass through a caching proxy.
  • The If-Modified-Since header is used.
  • HEAD is used to dereference links when the content is not important.
  • If Content-Encoding: gzip is supported by the server, gzip encoding is used for download.
  • The robots.txt is cached for 24 hours.