Archive for October, 2014

31
Oct
14

WordPress VIP Development environment setup (Mac)

Oh boy, what a pain! Setting up Unix tools on a Mac is one of my least favourite things because

  1. Apple puts Unix stuff in very odd places, mucking up install assumptions
  2. Apple messes with the whole root concept in sometimes counter intuitive ways : sudo effectively means “local admin” – not really root, and lots of installers these days blocks sudo for safety reasons, which makes install on a mac tricky because you NEED “elevated” permission to do many things.
  3. Apple builds in key “optional extras” that have now become core to Unix style installers – Python, Perl and Ruby. The Apple versions, even in up-to-date installs are … pants (English insult meaning “bad”). This means lots of parallel installs of key scripting enablers (more on this later)

So here is what I had to do, more or less blow-by-blow.

WordPress “official” instructions are here : http://vip.wordpress.com/documentation/quickstart/ (slightly ironic title 🙂

So … I am behind a corporate proxy, and I am going to need authoxy (or similar)

download authoxy from www.hrsoftworks.net/Products.php

  1. run the download
  2. double click Authoxy pkg (control click–>open–>open might be needed depending on your “unknown package” settings)
  3. you might need local admin to install this! type in username and password
  4. open “System Preferences”
  5. goto Network. Choose the network you connect to your corp network on, open it. Goto Proxies. Select the Automatic Proxy Configuration and cut and paste the “URL” and back out to the top level
  6. authoxy settings are in the bottom section of “System Preferences”, open them
  7. select “use automatic configuration (pac) file” and paste the “URL” in
  8. put in your network username/password
  9. change authoxy port to (say) 9000
  10. start authoxy from the button top left

download virtualbox, install.

download vagrant, install.

When  I saw that vagrant uses Ruby, my heart sank. Not because I hate Ruby, but because I always seem to have issues with it on Macs.

Your built-in Ruby may well be no good. If later steps fail with random build errors and dependency issues, try a newer ruby.

Now crack open a terminal.

> ruby –version

Mine showed 2.0. As things turned out, I needed 2.1 at least. So I used the tasty ruby-install to give me a better copy and then changed my path to look there first.

> vagrant plugin install vagrant-proxyconf

if that works you need to create a Vagrant Proxy file I used vi, vi commands are bold

> vi ~/.vagrant.d/Vagrantfile

i

Vagrant.configure("2") do |config|
  if Vagrant.has_plugin?("vagrant-proxyconf")
    config.proxy.http     = "http://10.0.2.2:9000/"
    config.proxy.https    = "http://10.0.2.2:9000/"
    config.svn_proxy.http = "http://10.0.2.2:9000/"
  end
end
[esc]:wq

> cd vip-quickstart

> vagrant up

wait for it … pray … wait

> vagrant provision

Wait, and maybe repeat the provision, … if you get any network errors, your proxy might not be set up right

The first time I did provision I got a blank 200 response story on vip.local (where your dev site will “live”) and some odd network errors. I second go resolved that.

victory !

(smile)

Advertisements
13
Oct
14

The tyranny of distance, or why webscale in Australia is tough

While trying to understand some site performance observations with a guru from WordPress VIP Mr Barry Abrahamson I was reminded just how odd routes from Australia to the USA can be sometimes.

In my case from a Telstra ISP 4G service in Sydney:

  • wp.com resolved to Australia,Brisbane, via Sydney Kent St exchange about 2km from my location in inner Sydney, an odd little route given EdgeCast has a POP in Sydney !
  • wordpress.com routed to hong kong to bay area and the landings near LA
  • sat.wordpress.com routed to texas, san antonio via dallas peering which came from Hong Kong –> Taiwan then jumped the Pacific, seemingly not the most direct route 🙂

3 distinct locations and routes for one web page.

I see these odd routes a lot, so perhaps a little look at how to trace and decode the output is in order.

First thing you need is a machine that can run traceroute, ping and curl – I will use my trusty mac pro. Secondly you need a connection that is not excessively firewalled – many corporate environments block these useful tools, and some ISPs and home firewalls might too. If you have a firewall at home you might need to allow ICMP.

Ok so first we ping each host:

ping -c3 -n i0.wp.com
PING cs82.wac.edgecastcdn.net (68.232.44.111): 56 data bytes
64 bytes from 68.232.44.111: icmp_seq=0 ttl=57 time=78.895 ms
64 bytes from 68.232.44.111: icmp_seq=1 ttl=57 time=76.929 ms
64 bytes from 68.232.44.111: icmp_seq=2 ttl=57 time=137.781 ms

ping -c3 -n wordpress.com
PING wordpress.com (192.0.78.17): 56 data bytes
64 bytes from 192.0.78.17: icmp_seq=0 ttl=49 time=613.950 ms
64 bytes from 192.0.78.17: icmp_seq=1 ttl=49 time=556.325 ms
64 bytes from 192.0.78.17: icmp_seq=2 ttl=49 time=762.889 ms

ping -c3 -n sat.wordpress.com
PING sat.wordpress.com (76.74.254.120): 56 data bytes
64 bytes from 76.74.254.120: icmp_seq=0 ttl=45 time=963.449 ms
64 bytes from 76.74.254.120: icmp_seq=1 ttl=45 time=572.970 ms
64 bytes from 76.74.254.120: icmp_seq=2 ttl=45 time=805.222 ms

OK from this I can clearly see that i0.wp.com is somewhere fairly local, while wordpress.com is “over the pond”. Also looks like sat.wordpress.com might be father away than wordpress.com, interesting. Running ping a couple more times (I usually do it at least 3 times) will help see any other anomalies.

so now to the big guns – traceroute:

traceroute -I i0.wp.com
traceroute to cs82.wac.edgecastcdn.net (68.232.44.111), 64 hops max, 72 byte packets
1  172.20.10.1 (172.20.10.1)  28.801 ms  24.031 ms  1.678 ms
2  * * *
3  * * *
4  * * *
5  tengige0-5-0-15.chw-edge901.sydney.telstra.net (139.130.111.73)  170.490 ms  30.207 ms  28.705 ms
6  bundle-ether2.ken-edge901.sydney.telstra.net (203.50.11.102)  32.643 ms  29.309 ms  32.602 ms
7  ver1542775.lnk.telstra.net (139.130.197.82)  27.004 ms  24.701 ms  31.644 ms
8  68.232.44.111 (68.232.44.111)  28.148 ms  28.552 ms  30.501 ms

mdsl026351:~ macadmin$ traceroute -I wordpress.com
traceroute: Warning: wordpress.com has multiple addresses; using 192.0.78.9
traceroute to wordpress.com (192.0.78.9), 64 hops max, 72 byte packets
1  172.20.10.1 (172.20.10.1)  1.911 ms  1.460 ms  1.406 ms
2  * * *
3  * * *
4  * * *
5  tengige0-5-0-15.chw-edge901.sydney.telstra.net (139.130.111.73)  1253.110 ms  111.398 ms  24.262 ms
6  bundle-ether13.chw-core10.sydney.telstra.net (203.50.11.98)  32.387 ms  29.947 ms  40.252 ms
7  bundle-ether17.oxf-gw2.sydney.telstra.net (203.50.13.70)  31.405 ms  33.614 ms  38.195 ms
8  bundle-ether1.sydo-core01.sydney.reach.com (203.50.13.38)  42.390 ms  36.117 ms  31.180 ms
9  i-0-2-0-5.sydo-core02.bi.telstraglobal.net (202.84.223.42)  30.383 ms  31.534 ms  32.370 ms
10  i-0-2-0-0.1wlt-core01.bx.telstraglobal.net (202.84.141.146)  202.327 ms  201.500 ms  523.056 ms
11  i-0-5-0-3.tlot02.bi.telstraglobal.net (202.84.253.46)  603.329 ms  346.085 ms  577.464 ms
12  xe-11-2-0.edge1.losangeles6.level3.net (4.68.62.9)  613.410 ms  614.335 ms  613.184 ms
13  ae-4-90.edge2.losangeles9.level3.net (4.69.144.207)  307.141 ms  922.118 ms  920.883 ms
14  peer-1-netw.edge2.losangeles9.level3.net (4.53.230.6)  921.686 ms  607.118 ms  466.507 ms
15  * * *
16  192.0.78.9 (192.0.78.9)  812.169 ms  613.593 ms  614.946 ms

mdsl026351:~ macadmin$ traceroute -I sat.wordpress.com
traceroute: Warning: sat.wordpress.com has multiple addresses; using 76.74.254.123
traceroute to sat.wordpress.com (76.74.254.123), 64 hops max, 72 byte packets
1  172.20.10.1 (172.20.10.1)  24.285 ms  5.433 ms  5.907 ms
2  * * *
3  * * *
4  * * *
5  tengige0-5-0-15.chw-edge901.sydney.telstra.net (139.130.111.73)  169.809 ms  29.662 ms  29.179 ms
6  bundle-ether13.chw-core10.sydney.telstra.net (203.50.11.98)  30.460 ms  31.108 ms  29.384 ms
7  bundle-ether17.oxf-gw2.sydney.telstra.net (203.50.13.70)  32.356 ms  40.007 ms  40.267 ms
8  bundle-ether1.sydo-core01.sydney.reach.com (203.50.13.38)  36.161 ms  32.148 ms  41.099 ms
9  i-0-2-0-5.sydo-core02.bi.telstraglobal.net (202.84.223.42)  39.351 ms  39.832 ms  40.266 ms
10  i-0-4-0-5.1wlt-core01.bx.telstraglobal.net (202.84.140.102)  394.816 ms  258.567 ms  2206.390 ms
11  i-0-0-0-3.tlot02.bi.telstraglobal.net (202.84.251.189)  575.466 ms  614.253 ms  614.506 ms
12  gtt-peer.tlot02.pr.telstraglobal.net (134.159.61.98)  614.388 ms  615.384 ms  613.874 ms
13  xe-0-0-0.dal33.ip4.gtt.net (89.149.187.94)  306.713 ms  921.466 ms  642.280 ms
14  peer1-gw.ip4.gtt.net (77.67.71.30)  584.913 ms  700.331 ms  614.389 ms
15  * * *
16  * * *
17   (76.74.254.123)  718.998 ms  477.185 ms  442.695 ms

OK so let pick this apart:

the first thing to know is that all network engineers love order, and the naming of router nodes is easy to decode once you know how. Each line in a traceroute is a router, and each router is hit with an ICMP request that says (in effect) “name thy self”. Some say “go away!” (***). The time this “ping” takes is measured, and this helps judge the accumulated “cable distance” – you will note a general trend for time to go up for each step.

i0.wp.com:

  • clearly the closest
  • gotta love Telstra – very good node naming! from Sydney we go to …
  • some boundary node at 139.130.197.82 then to 68.232.44.111
  • a quick squiz in a tool like info sniper or maxmind will allow you to infer this seems to be in brisbane and is EdgeCast. NO I hear you cry! 68.232.44.111 is in the USA! Well, no. The address range is assigned in the USA but the ping surely tells us it it in nearer. How can this be! Easy – EdgeCast uses AnyCast BGP, which in layman’s terms means that it’s ip addresses are in many places at once. The brisbane former hop is the giveaway – no route from Australia gets to the USA in one hop (more on this in a sec)

wordress.com and sat.worpress.com:

  • clearly over the pond, both go to Hong Kong seeming with out any other hops first (more on this later)
  • sat then goes to Texas via Taiwan again with little in between (!)
  • the main domain routes to LA, straight from Hong Kong

So some odd things are:

  1. why Taiwan to Texas direct?
  2. how does one get from Australia to HongKong direct ?

The answer to both these question is the joys of optical networking, specifically ADM (Add Drop Mux). While there is no actual unbroken path between Australia and Hong Kong (Guam is in the middle) or Taiwan and Texas (the Pacific and Several States of the Union are in the way), there are high capacity “trails” (leased circuits) – digitally fractionated parts of massive optical under sea and land based trunks that effectively bridge the distance between those points at close to the speed of light.

The internet is stranger and more beautiful than most appreciate 🙂

Onwards and upwards 🙂

08
Oct
14

Here are my slides for WordCamp Sydney 2014

Here is my presentation from WordCamp Sydney on NewsCorp and our plans with WordPress.

WordCamp 2014 Sydney PDF (c) 2014 NewsCorp, please ask for permission before reuse.

08
Oct
14

DNS funs and games – an Aussie site on wordpress.com VIP

The average wordpress.com user probably does not notice it, but there is a complex dance underneath every delivery of a page from wordpress.com. Basically for a site x.wordpress.com it goes like this:

  1. browser DNS resolves x.wordpress.com to lb.wordpress.com
  2. lb.wordpress.com resolves to multiple ips, the browsers computer picks one (this is a slightly complex area, either the top one is chosen or some “closest” logic is used)
  3. a HTTP GET is done on the IP selected for lb.wordpress.com
  4. the page that is returned is searched for links to images, css, js etc and those files fetched. js Scripts are run and they may also fetch files.

wordpress.com does not put all its files in a single place – a quick search in network view in the developer tools in your browsers will reveal:

  • stuff coming right from wordpress.com
  • subdomains of wp.com like s1.wp.com selected seemingly at random as well as more specific ones like widgets.wp.com
  • requests to stats.wp.com and pixel.wp.com
  • requests to public-api.wordpress.com
  • your subdomain but on files.wordpress.com eg x.files.wordpress.com
  • subdomains of gravatar.com like 0.gravatar.com selected seemingly at random
  • calls to third party domains like gstatic.com

some of this is cached close you via a CDN (EdgeCast), and some “goes to origin” at one of the three datacenters wordpress.com uses. It is the latter stuff that concerns me as a webscale guy, more on this later.

so what are all these domains for?

  1. your domain (via lb.wordpress.com) is where php “lives”, fronted by good old nginx and batcache goodness
  2. wp.com is where theme things like css and js live, cached close to you by EdgeCast. As to why all the subdomains ? Good question – there a two main reasons this is done, one is to share load across source servers (not needed in this case thanks to EdgeCast), the other is to “trick” browsers into opening more connections at once (some browsers have a rule to the effect of “only 4 connections per domain”)
  3. stats and pixel.wp.com  is where some internal reporting is driven from – a “pixel” in webscale speak is a tiny image ( generally made invisible via CSS) on a page that tracks the progress of a page loading – when the pixel is hit the web server on the other end (nginx again in this case) logs the hit along with all the requests parameters and cookies and registers a “hit”. The log is then mined using big data techniques or a real time pipeline.  The WordPress ones are tiny smiles, nice.
  4. public-api is interesting – REST services that are called continually from your pages via scripts to update the page on the fly without a reload, /notifications is the one I noted.
  5. files.wordpress.com is where your media assets like images live, care of nginx, and this includes cool toys like the ability to rescale images eg this vs this
  6. good old gravatar.com – a very cool service from automattic that stores not just your profile picture but also a profile page and a whole API that can give you structured data in many forms (JSON, PHP, QR code etc). Check me out.

The goal of all this is to make you page load quickly – if you are an aspiring PHP hoster or webscale architect you could do worse than to look at what wordpress.com are up to, because they are an exemplar of “tier 1” internet design (the other candidates off the top of my head being google, facebook, ebay, paypal and amazon).

So why am I trying to improve on perfection? In a word, distance – my employers customers in Australia are a long way from the core of the internet in the USA, and while wordpress.com do an awesome job of hiding this it does not work for the complex pages used by news sites. So I need to add another layer to those parts of wordpress.com that are not on EdgeCast. The full complexity can wait for a future post, but basically we route some non-EdgeCast traffic (particularly those PHP pages on our main domains) via a CDN called akamai. Akamai is the unsung hero of the internet – something like half of all you internet traffic probably uses akamai, particularly images, audio and video, and in countries like Australia it is critical – akamai “stacks”, as the servers are called, live out on the edge of the network, right near where your ADSL or cable connections is aggregated for connection to the internet by your ISP. The stacks intercept the traffic they are told to manage and preform complex mapping and caching, even inserting parts of pages on the fly. This allows us to respond as if a highly customized page is locally hosted in Australia rather than on remote servers in Chicago, San Antonio, or Dallas.

onwards and upwards 🙂

01
Oct
14

After the Camp

WordCamp Sydney 2014 was lots of fun, still processing it all to be honest, more thoughts over the coming days.

My presentation on our upcoming use of WordPress VIP at NewsCorp seemed to go down well, I will link to it once it is on wordpress.tv (or wherever).

So this week I am wrestling with DNS and WordPress VIP. This is a bit tricky for us – traditionally our sites redirected from TLD to www, but WordPress VIP works the other way about.

Also we use the advanced and rather splendid DNS from akamai, where as WordPress prefers to use their own. That would mean giving up akamai features for any subdomains that contain non-WordPress services like our API and node.js, which would be negative for both security and scalability. Using subdomains in this way is done so that same domain policies in JavaScript are managed simply.

The “modern” solution of course is to use CORS or JSONP for our non-WordPress services so domains matter less, but each of these have some issues – CORS only works on recent browsers and JSONP can cause some evil non-cacheability due to the callbacks in the requests having names the service provider can not control.

So in the end I think we just need to compromise and keep our current DNS provider.

Onwards and upwards 🙂