I was recently taken to a discussion: Should the online service or application name be included in the hostname or the path of the site address? The actual discussion went a bit differently but the basic idea was just that.
The customer has an own domain name (like majornetwork.net) which is also a well-known brand and portal to their service offerings. They don’t want to register a new domain name for the new service or many new services, they just want to use the names of the new services as part of there existing online identity.
For the example’s sake, let’s call the new service “Runner“. (There is no reference to any past, existing or future service or product of any kind, this is just an example.) The first option mentioned is to include the service name in the hostname portion of the URL. For example, if they want build the Runner service, the address might be “runner.majornetwork.net“. For the “Walker” service (again, just an example), the address would be “walker.majornetwork.net“. The customer would then create DNS entries for both new hostnames.
The second option is to use the original domain and just add the service name in the path of the resource identifier, like “majornetwork.net/runner” and “majornetwork.net/walker“.
“Why should I, a network specialist, care? The application people can decide that kind of things, I’ll just build the networks and move the bits and bytes around.”
Well, not exactly. Follow through all the following if you dare.
Nowadays when you are planning a new web-based service you quite often plan for some kind of cloud deployment. By cloud I now mean an implementation where the service is not running in the customer’s own datacenter but it is running in an IaaS platform somewhere outside the customer network. There may be capacity, cost or flexibility reasons, it depends, but anyway the motivation is that it must be possible to run the service from basically anywhere in the world.
Now, let’s see with simplified examples what happens when the user first accesses the Runner service and then the Walker service, with both of our options:
Hostname case:
- The user opens the browser and enters “runner.majornetwork.net” in the address bar.
- The browser uses the DNS resolver to get the IP address of runner.majornetwork.net.
- The browser receives the DNS response, like 1.2.3.4, and makes a HTTP/TCP connection to 1.2.3.4 and starts showing the content.
- Then, the user types “walker.majornetwork.net” in the address bar.
- The browser uses the DNS resolver to get the IP address of walker.majornetwork.net.
- The browser receives the DNS response, like 4.3.2.1, and makes a HTTP/TCP connection to 4.3.2.1, and starts showing the content.
Path case:
- The user opens the browser and enters “majornetwork.net/runner” in the address bar.
- The browser uses the DNS resolver to get the IP address of majornetwork.net.
- The browser receives the DNS response, like 5.6.7.8, and makes a HTTP/TCP connection to 5.6.7.8 and starts showing the content.
- Then, the user types “majornetwork.net/walker” in the address bar.
- The browser uses the DNS resolver to get the IP address of majornetwork.net (unless it already has it cached).
- The browser receives the DNS response, like 5.6.7.8, and makes a HTTP/TCP connection to 5.6.7.8 and starts showing the content.
When we look at the hostname case we immediately notice that the Runner and Walker services are running on different IP addresses (1.2.3.4 and 4.3.2.1). That is because the customer has implemented the Walker service in their own datacenter but is leveraging an IaaS platform for the Runner service. No problem there, DNS takes care of the actual IP-wise location of the service, and the connections are taken and the content is delivered from where the servers reside. The customer can freely decide where to implement the services, and the IP connections will always be established to the correct servers. (Also, if desired, the services can run on the same server. Then the DNS names will just resolve to the same IP address, and the HTTP servers can deal with the situation by examining the Host headers of the HTTP requests. Alternatively the server can have two IP addresses.)
In the path case, however, the DNS name of the service is the same for both services: majornetwork.net. Thus the resulting IP addresses of “majornetwork.net/runner” and “majornetwork.net/walker” requests are the same, and the HTTP requests will go to the same server. Basically, the customer needs to decide whether to run both of the services in their own datacenter, in the IaaS platform outside, or in both places at the same time (depending if they have multiple A/AAAA records for majornetwork.net), but the point is that the Runner and Walker services now go hand-in-hand, they must both be implemented in all the majornetwork.net servers.
Can I Have a Load Balancer?
Someone has heard about load balancers (either hardware or software implementations, it does not matter) that can do load balancing decisions based on the path in the HTTP request. Yes, that is absolutely true, it can be characterized as a basic feature in HTTP load balancers. It basically means that using a load balancer you can direct the connections of “majornetwork.net/runner” and “majornetwork.net/walker” to different server pools in the background. Then you don’t have to run both of the applications in the same servers because the load balancers are directing only the requests for the correct service to each of the servers. You can have as many servers as you want, the load balancer will take care of balancing the load (raise your hand now if you don’t know what I’m talking about).
In the hostname case the load balancer implementation is equally simple. You just point both of the hostnames “runner.majornetwork.net” and “walker.majornetwork.net” to the load balancer, apply some configurations, and now the load balancer again can do its job properly with no difficulties, separating the two services to different servers, this time using the destination IP address or the HTTP Host header in the HTTP request.
An important thing to understand here is that while the actual servers (behind the load balancer) can be located anywhere in the Internet the data flow will always go via the load balancers. So, the load balancers can be in the customer’s datacenter and the servers (or some of the servers) can be “in the cloud”, but the users’ requests and servers’ responses will always go through the customer’s datacenter.
Now, when you compare the hostname and path cases, you realize that if you choose the hostname way of naming your service you can implement a separate load balancer in the IaaS platform for “runner.majornetwork.net” and another load balancer in the customer datacenter for “walker.majornetwork.net“, and both applications will run smoothly. You can add server capacity for both applications as you need. The content will always be served from the same datacenter where the request came in originally. No problem here.
If you choose the path way of naming the service then you may have a problem: the requests will always come to the datacenter where the load balancer hosting the “majornetwork.net” is, and all the data will flow through that datacenter. This is not a problem in every case, but imagine that the Runner application (“majornetwork.net/runner“) is hosted in the IaaS platform and is gaining popularity. All the requests will still come to the customer’s datacenter (because “majornetwork.net” points there), and the datacenter connection capacity may become a problem. That could lead to disaster as the users are not able to get the content appropriately. It does not help to add more servers in the IaaS platform because the bottleneck is still in the connectivity of the customer datacenter, not in the processing power in the IaaS platform. In addition to the capacity issues the latency will also be higher when the connections are being routed here and there. That may affect the user experience even with small loads.
Can I Balance the Load to Different Datacenters?
As a generic answer, yes. It is called global load balancing. In global load balancing the content is usually served to the user from the “nearest” available server. The concept of “nearest” is somewhat relative but in the most basic case the global load balancing decision is made based on the country the user is located in. That way the content can be served from the same country that the user is in, and all the users can still use the same server name to connect to the service.
Using global load balancing the flow for an example connection to “majornetwork.net/runner” goes like this:
- The user opens the browser and enters “majornetwork.net/runner” in the address bar.
- The browser (or the operating system) sends the DNS query to the DNS resolver server about “majornetwork.net“.
- The DNS resolver server does not find the address in the cache, so it will make a query of its own, querying for “majornetwork.net“.
- The global load balancer (GLB) will get the DNS query because it is configured as the authoritative DNS server for “majornetwork.net“.
- The GLB will use a geo-IP lookup to see in which country the querying DNS resolver is located.
- Based on the geo-IP lookup the GLB will decide what is the best available destination for the user connection, and it will return the correct DNS response to the DNS resolver.
- The DNS resolver receives the response and sends it to the browser.
- The browser receives the DNS response, like 2.3.4.5, and makes a HTTP/TCP connection to 2.3.4.5 and starts showing the content.
I’m slipping out of the original scope of this post (I won’t go in the details of GLB variations here) but the important piece of information here is that the GLB can decide the correct datacenter for the user connection based on the DNS name (hostname) that the user is connecting to. The GLB itself does not get any of the user-server traffic load at all (unless it is also acting as a local load balancer), it just does the DNS magic.
Basically, in the GLB example case, if the user is near the customer’s datacenter, then the connection is made to the customer’s datacenter. If the IaaS platform is nearer to the user, then the connection is taken there and the customer’s datacenter connection is not needed for the user data at all.
However, what the GLB is not able to do is reading the pathnames of the HTTP requests. From the GLB point of view “majornetwork.net/runner” and “majornetwork.net/walker” are just the same because the DNS name is the same. GLB will not even see the pathnames because it is only able to operate on the DNS requests, which only consist of the hostnames in the URLs. It does not receive the HTTP request at any point.
Thus, if you try to use GLB to divert “majornetwork.net/runner” to IaaS and “majornetwork.net/walker” to the customer’s own datacenter, you are just not able to do it. You end up with the need to run both services in both sites in this case.
In the hostname case (“runner.majornetwork.net” and “walker.majornetwork.net“) it is easy to use GLB to run the services in those datacenters that you want, independently of each other.
Conclusions
Finally, let me repeat the original question: Should the online service or application name be included in the hostname or the path of the site address?
As in so many cases in IT, the answer is “it depends”.
If you want to run the services separately in totally different places (like different datacenters or using public IaaS platforms) and you expect running into connection capacity or latency issues, do not put your services in the same DNS name. Use different hostnames so that you can locate the services as you like and do global load balancing if needed.
Note that you can still use HTTP redirection techniques to get “majornetwork.net/runner” to actually go to “runner.majornetwork.net” if you want. Just take care that your services and applications don’t need the redirections internally so that the suboptimal internal redirections don’t start consuming bandwidth or accumulating latency.
This is really a good article from scratch to production and further to scaling up services…nice !
Thanks!
Very interesting, I was wondering how google.com resolves its IPs address out of the United States and seems like if the Global Load Balancing is the solution