I’m sure this is a common problem: getting the domain or host name from a URL using JavaScript. There are certainly many solutions to this problem out there. However, the solutions I found weren’t robust enough for my needs, so I ended up writing my own.
The Problem With Most Solutions
Most of the solutions I found suffered from one or more of the following problems:
They rely on window.location.hostname (or similar)
The vast majority of solutions out there work by parsing window.location.hostname
or window.location.href
. This would be fine if I was working out the domain of the page I’m currently on, but I want to work it out for a URL that I’m not actually visiting at the moment. It has to work for any URL stored in a variable in string format.
They don’t cater for URL parameters or hash URLs
Most solutions parse the URL based only on the forward slash. That works for the most common URLs, such as
- http://scratch99.com/web-development/javascript/
which will return scratch99.com, but it won’t work with the following URLs:
- http://scratch99.com#footer
- scratch99.com?s=web+development
which would return the full URL in both cases. I need it to strip out everything after the ? and # characters.
They don’t strip out the http://
I’m processing the URL so I can store the domain and use it in a URL, in the following way:
http://domain.com/sources/theparseddomain.com/
I do not want the following:
http://domain.com/sources/http://theparseddomain.com/
so I need to strip out the http://
when parsing the URL.
How To Parse A URL And Get The Hostname
My solution, taking all of the problems above into account, is as follows:
[sourcecode language=”javascript”]
var url = "http://scratch99.com/web-development/javascript/";
var domain = url.replace(‘http://’,”).replace(‘https://’,”).split(/[/?#]/)[0];
[/sourcecode]
In the example above, the domain variable will contain the value “scratch99.com”.
It is the second line in the code above that is important. The url variable can be changed to the URL that you need to work with, whether you just change the line of code, or set the variable using a form, or even loop through an array of URLs.
Lets look at that second line of code a little more closely:
First, the .replace('http://','')
strips the http://
off the URL. I do this for the reasons explained above, but it also makes the rest of the code simpler. If we don’t do this, then depending on whether the http://
is present or not, the domain name may be either the 1st or 3rd element of the split result array. If we strip it, it will always be the first element.
This doesn’t cater for user data entry problems, such as the user leaving the : out of a URL, eg: http//domain.com/
. I’ll take that as acceptable, as in this case the user is going to be me! However, if this was for end users, you’d probably want to check that they didn’t make such a mistake.
Thanks to Edward Caissie for pointing out in the comments that the original solution didn’t cater for
https:
. I’ve now added an extra replace to cater for this. I’m sure there’s a way to do it with a single replace using regex, but will leave it there for now.
Next, the .split(/[/?#]/)
splits the resulting string based on regex. The string will be split into parts, based not just on the forward slash, but also on the ? and # characters. The first item of the resulting array will be the hostname. This works, but I’m far from a regex master, so if any has a better way of doing this, let me know in the comments.
Finally, the [0]
gets the first element of the array resulting from the split method. For improved readability, you could move this to a separate line of code, eg:
[sourcecode language=”javascript”]
var url = "http://scratch99.com/web-development/javascript/";
var urlParts = url.replace(‘http://’,”).replace(‘https://’,”).split(/[/?#]/);
var domain = urlParts[0];
[/sourcecode]
Hostname Vs Domain Name
Strictly speaking, the code above returns the host name (including the subdomain), rather than the domain. That suits my needs. If you just want to strip the “www.”, then you can use the following code in place of line 2 above (note the extra replace):
[sourcecode language=”javascript”]var sourceString = url.replace(‘http://’,”).replace(‘https://’,”).replace(‘www.’,”).split(/[/?#]/)[0];[/sourcecode]
If you need to remove any subdomain, not just www, then you’ll have to do some extra parsing of the result.
Examples
The following examples will all result in scratch99.com being returned:
- http://scratch99.com/web-development/javascript/
- http://scratch99.com#footer
- scratch99.com?s=web+development
Try it yourself with any URL you like:
If you can break it, let me know!
Just a quick note, you might want to take into consideration ‘https’, for example, https://twitter.com returns https when you use the “Extract” above.
Otherwise, I like it.
It breaks if using protocol agnostic uri:
//ajax.googleapis.com
Thanks for sharing!
Nice script .. but what should i do if i want only the base domain name .. like i want google.com only from http://mail.google.com .
Or just this:
var hn = window.location.hostname.split(‘.’).reverse();
var host = hn[1] + “.” + hn[0];
console.log(“Host:”, host);
This won’t work for http://domain.com:8080/dsadsa/dsadsa
Or you could avoid the the replace() with…..
domain = “http://scratch99.com/web-development/javascript/”.split(/\/\/|\//)[1]
…works on “//ajax.googleapis.com” aswell and wont care what the protocol was
Oops, make that….
“http://scratch99.com/web-development/javascript/”.split(/\/\/|[?#\/]/)[1]
…must contain a protocol tho or start with //, which is fine in my case
As pointed out, this will break on domains like “anything.domain.co.uk” and a number of others, for my situation i needed the domain name from the current url, so I implemented a cookie setting technique to get the root domain, you can get the code at… http://rossscrivener.co.uk/blog/javascript-get-domain-exclude-subdomain
You should strip all protocols from https:// passing through ftp:// and ending in wss://
I want only scratch99 from http://scratch99.com/web-development/javascript/. I am using Codeigniter
url.replace(/https?:\/\/([^\/]*)\/.*/,’$1′).split(“.”)[0];
You can also use this: ^(?:https?\:\/\/)?(?:www\.)?([^\.]+)(?:\.).*$ first group $1 will have it