The Problem With Most Solutions
Most of the solutions I found suffered from one or more of the following problems:
They rely on window.location.hostname (or similar)
The vast majority of solutions out there work by parsing
window.location.href. This would be fine if I was working out the domain of the page I’m currently on, but I want to work it out for a URL that I’m not actually visiting at the moment. It has to work for any URL stored in a variable in string format.
They don’t cater for URL parameters or hash URLs
Most solutions parse the URL based only on the forward slash. That works for the most common URLs, such as
which will return scratch99.com, but it won’t work with the following URLs:
which would return the full URL in both cases. I need it to strip out everything after the ? and # characters.
They don’t strip out the http://
I’m processing the URL so I can store the domain and use it in a URL, in the following way:
I do not want the following:
so I need to strip out the
http:// when parsing the URL.
How To Parse A URL And Get The Hostname
My solution, taking all of the problems above into account, is as follows:
In the example above, the domain variable will contain the value “scratch99.com”.
Lets look at that second line of code a little more closely:
.replace('http://','') strips the
http:// off the URL. I do this for the reasons explained above, but it also makes the rest of the code simpler. If we don’t do this, then depending on whether the
http:// is present or not, the domain name may be either the 1st or 3rd element of the split result array. If we strip it, it will always be the first element.
http//domain.com/. I’ll take that as acceptable, as in this case the user is going to be me! However, if this was for end users, you’d probably want to check that they didn’t make such a mistake.
Thanks to Edward Caissie for pointing out in the comments that the original solution didn’t cater for
https:. I’ve now added an extra replace to cater for this. I’m sure there’s a way to do it with a single replace using regex, but will leave it there for now.
.split(/[/?#]/) splits the resulting string based on regex. The string will be split into parts, based not just on the forward slash, but also on the ? and # characters. The first item of the resulting array will be the hostname. This works, but I’m far from a regex master, so if any has a better way of doing this, let me know in the comments.
 gets the first element of the array resulting from the split method. For improved readability, you could move this to a separate line of code, eg:
Hostname Vs Domain Name
Strictly speaking, the code above returns the host name (including the subdomain), rather than the domain. That suits my needs. If you just want to strip the “www.”, then you can use the following code in place of line 2 above (note the extra replace):
var sourceString = url.replace('http://','').replace('https://','').replace('www.','').split(/[/?#]/);
If you need to remove any subdomain, not just www, then you’ll have to do some extra parsing of the result.
The following examples will all result in scratch99.com being returned:
Try it yourself with any URL you like:
If you can break it, let me know!
Last updated on August 30th, 2011