[PHP] TLD List, Inc. Level 2 & 3

Whilst working on a new open-source project I needed to be able to be able to detect/remove the TLD’s from hostnames. This is no easy task algorithmically as example.co.uk (level 2) is just as valid as example.com (level 1) which is also just as valid as example.bob.shiga.jp (level 3). I came across a project from Mozilla that lists all TLDs. This isn’t the easiest format to work with each time you want to test hostnames. So I’ve built an auto-updater that will sort and serialize the domains into a format like:

[uk] =>

*.uk

!parliament.uk

!nhs.uk

The updater then serializes the array for easy access via PHP at a later date. You may be wondering what the * and !’s mean. To put it simply it means that there could be anything just before the .uk and it should be counted as the domain such as .co.uk, .gov.uk. The ! before means that its an exception to the * (wildcard) rule, so in the case of parliament.uk, the parliament part should not be counted as part of the domain. A full explanation of this can be found at http://publicsuffix.org/list/

Please find below the actual code for the updater. I decided against just posting the serialized list as the list will change over time where as this post may not.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<?php
//TLD List Updater (Formats to Serialized PHP)
//Created by Beaver6813 (Beaver6813.com). Big thanks to the Mozilla community!
$tldlist = file("http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1");
$toplist = array();
foreach($tldlist as $key=>$value)
	{
	$value = trim($value);
	if(substr_count($value,"//")==0&&!empty($value))
		{
		if(substr_count($value,".")==0)
			$sublist[$value] = array();
		else
			{
			$dotxplode = explode(".",$value);
			$dotxplode = array_reverse($dotxplode);
			$sublist[$dotxplode[0]][] = $value;
			}
		}
	}
file_put_contents("effective_tld_sublist.dat",serialize($sublist));
?>

Leave a Reply

Your email address will not be published. Required fields are marked *