DNS funs and games – an Aussie site on wordpress.com VIP

The average wordpress.com user probably does not notice it, but there is a complex dance underneath every delivery of a page from wordpress.com. Basically for a site x.wordpress.com it goes like this:

  1. browser DNS resolves x.wordpress.com to lb.wordpress.com
  2. lb.wordpress.com resolves to multiple ips, the browsers computer picks one (this is a slightly complex area, either the top one is chosen or some “closest” logic is used)
  3. a HTTP GET is done on the IP selected for lb.wordpress.com
  4. the page that is returned is searched for links to images, css, js etc and those files fetched. js Scripts are run and they may also fetch files.

wordpress.com does not put all its files in a single place – a quick search in network view in the developer tools in your browsers will reveal:

  • stuff coming right from wordpress.com
  • subdomains of wp.com like s1.wp.com selected seemingly at random as well as more specific ones like widgets.wp.com
  • requests to stats.wp.com and pixel.wp.com
  • requests to public-api.wordpress.com
  • your subdomain but on files.wordpress.com eg x.files.wordpress.com
  • subdomains of gravatar.com like 0.gravatar.com selected seemingly at random
  • calls to third party domains like gstatic.com

some of this is cached close you via a CDN (EdgeCast), and some “goes to origin” at one of the three datacenters wordpress.com uses. It is the latter stuff that concerns me as a webscale guy, more on this later.

so what are all these domains for?

  1. your domain (via lb.wordpress.com) is where php “lives”, fronted by good old nginx and batcache goodness
  2. wp.com is where theme things like css and js live, cached close to you by EdgeCast. As to why all the subdomains ? Good question – there a two main reasons this is done, one is to share load across source servers (not needed in this case thanks to EdgeCast), the other is to “trick” browsers into opening more connections at once (some browsers have a rule to the effect of “only 4 connections per domain”)
  3. stats and pixel.wp.com  is where some internal reporting is driven from – a “pixel” in webscale speak is a tiny image ( generally made invisible via CSS) on a page that tracks the progress of a page loading – when the pixel is hit the web server on the other end (nginx again in this case) logs the hit along with all the requests parameters and cookies and registers a “hit”. The log is then mined using big data techniques or a real time pipeline.  The WordPress ones are tiny smiles, nice.
  4. public-api is interesting – REST services that are called continually from your pages via scripts to update the page on the fly without a reload, /notifications is the one I noted.
  5. files.wordpress.com is where your media assets like images live, care of nginx, and this includes cool toys like the ability to rescale images eg this vs this
  6. good old gravatar.com – a very cool service from automattic that stores not just your profile picture but also a profile page and a whole API that can give you structured data in many forms (JSON, PHP, QR code etc). Check me out.

The goal of all this is to make you page load quickly – if you are an aspiring PHP hoster or webscale architect you could do worse than to look at what wordpress.com are up to, because they are an exemplar of “tier 1” internet design (the other candidates off the top of my head being google, facebook, ebay, paypal and amazon).

So why am I trying to improve on perfection? In a word, distance – my employers customers in Australia are a long way from the core of the internet in the USA, and while wordpress.com do an awesome job of hiding this it does not work for the complex pages used by news sites. So I need to add another layer to those parts of wordpress.com that are not on EdgeCast. The full complexity can wait for a future post, but basically we route some non-EdgeCast traffic (particularly those PHP pages on our main domains) via a CDN called akamai. Akamai is the unsung hero of the internet – something like half of all you internet traffic probably uses akamai, particularly images, audio and video, and in countries like Australia it is critical – akamai “stacks”, as the servers are called, live out on the edge of the network, right near where your ADSL or cable connections is aggregated for connection to the internet by your ISP. The stacks intercept the traffic they are told to manage and preform complex mapping and caching, even inserting parts of pages on the fly. This allows us to respond as if a highly customized page is locally hosted in Australia rather than on remote servers in Chicago, San Antonio, or Dallas.

onwards and upwards 🙂


0 Responses to “DNS funs and games – an Aussie site on wordpress.com VIP”

  1. Leave a Comment

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: