B.T.’s inability to fix email. Poor but unsurprising.

So, first thing, a pet hate. The number of times I’m signing up for something and the email address validation doesn’t accept my perfectly valid address because of the .house top-level domain. And, we’re not talking small fry here. Everything from my bank (now fixed due to my persistence!), to B.T. – yes B-friggin’-T, the largest telecoms provider in the U.K.

Almost makes me want to never sign up with a proper email address and use 10 Minute Email forevermore.

It’s not like B.T. doesn’t have enough money to pay for decent developers to work on their shitty billing website. But no. Even after ringing them and complaining, the response I got was worse than their developers’ not being able to code a validation regex correctly.

“Its your problem, chage your email address.”

Helpful lady on the B.T. Broadband billing help line.

You’re kidding me, right? I pay for my personalized family domain name that I’ve had for nearly 8 years, and you want me to change it so I can use your website? There’s something wrong with this customer service picture.

The .house top-level domain has been around for ~9 years now.

Domains that don’t end in .COM, .ORG, .NET & .CO.UK have been around a long, long time, in fact, there are over 1500 top-level domains, so that’s probably 1,496 TLD’s your shitty validation won’t cope with.

And yet so many sites and mobile applications don’t accept perfectly valid domain names as part of the email address.

So let’s fix that. Here’s a simple bit of C# to download the list of TLD names from IANA – the Internet Assigned Numbers Authority – and then use a fairly well-tested regex to validate both the email address and the top-level domain. It’s literally 20 minutes of effort.

Before you go getting your knickers in a twist, the reg-ex doesn’t cover 100% of all cases but the ones it doesn’t cover really are super edge cases for odd character combinations that you probably can’t type on a real keyboard anyway.

I did come across an article by Haaked that gives a solution, but whilst it copes with most of what’s in the RFC defining how an email address should be formatted, it still doesn’t take into account valid TLDs. Even the mighty Haak can’t 100% fix this, so I feel pretty good with my attempt!

So, yes, your email may be formatted correctly, but if the TLD isn’t valid, it’s goin’ nowhere!

I did also find a javascript package called MailCheck. It does a load of clever sub-domain/TLD checking and some regex validation, but you guessed it, IT STILL DOESN’T VALIDATE THE TLD PROPERLY.

They still have a static list of ‘valid’ TLD’s. Yes, they’ve provided the methods to ‘customise’ the list, but since you can’t make this up as you go along, why not build in getting the list from the horse’s mouth? IANA regularly provides this text list of all the TLDs!

Matching the TLD against a regex that says it has to be alphanumeric might make it syntactically valid, but it’s still wrong, because, well, D-N-S.

To be fair, B.T. probably has decent developers but for whatever reason, this bit of crap code slipped through their water-tight pull request process.

So here’s a little snippet I knocked up to see just how difficult proper TLD validation would be. Hint. Not very.

string[] tldNames;

void Main()
{
	tldNames = FetchIANARootDb();

	ValidateEmail("[email protected]").Dump("[email protected]");
	ValidateEmail("[email protected]").Dump("[email protected]");
	ValidateEmail("[email protected]").Dump("[email protected]");
	ValidateEmail("[email protected]").Dump("[email protected]");
	ValidateEmail("[email protected]").Dump("[email protected]");
	ValidateEmail("[email protected]").Dump("[email protected]");
	ValidateEmail("invalid@[email protected]").Dump("invalid@[email protected]");
	ValidateEmail(@"Fred\ [email protected]").Dump(@"Fred\ [email protected]");
	ValidateEmail(@"""Fred\ Bloggs""@example.com").Dump(@"""Fred\ Bloggs""@example.com");
	ValidateEmail(@"rob@hÔtels.com").Dump(@"rob@hÔtels.com");
	ValidateEmail(@"rob@hÔtels.cÔm").Dump(@"rob@hÔtels.cÔm");
	ValidateEmail(@"[email protected]").Dump(@"[email protected]");
	ValidateEmail(@"rob@uk").Dump(@"rob@uk");

	//Phill Haak email tests.

	"Haaked email test".Dump();
	ValidateEmail(@"Abc\@[email protected]").Dump(@"Abc\@[email protected]");
	ValidateEmail(@"Fred\ [email protected]").Dump(@"Fred\ [email protected]");
	ValidateEmail(@"Joe.\\[email protected]").Dump(@"Joe.\\[email protected]");
	ValidateEmail("\"Abc@def\"@example.com").Dump("\"Abc@def\"@example.com");
	ValidateEmail("\"Fred Bloggs\"@example.com").Dump("\"Fred Bloggs\"@example.com");
	ValidateEmail(@"customer/[email protected]").Dump(@"customer/[email protected]");
	ValidateEmail(@"[email protected]").Dump(@"[email protected]");
	ValidateEmail(@"!def!xyz%[email protected]").Dump(@"!def!xyz%[email protected]");
	ValidateEmail(@"[email protected]").Dump(@"[email protected]");


}


bool ValidateEmail(string email)
{
	string domainName = string.Empty;

	if (string.IsNullOrWhiteSpace(email))
		return false;
	try
	{
		// Normalize the domain
		email = Regex.Replace(email, @"(@)(.+)$", DomainMapper,
							  RegexOptions.None, TimeSpan.FromMilliseconds(200));

		// Examines the domain part of the email and normalizes it.
		string DomainMapper(Match match)
		{
			// IdnMapping class converts Unicode domain names see https://tools.ietf.org/html/rfc3492
			var idn = new IdnMapping();

			// Pull out and process domain name (throws ArgumentException on invalid)
			domainName = idn.GetAscii(match.Groups[2].Value);

			return match.Groups[1].Value + domainName;
		}
	}
	catch (RegexMatchTimeoutException e)
	{
		return false;
	}
	catch (ArgumentException e)
	{
		return false;
	}

	try
	{
		// valid email format regex 
		string stackOFPattern = @"^(?("")("".+?(?<!\\)""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&\\'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" +
			@"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-0-9a-z]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$";
			
		bool validFormat = Regex.IsMatch(email,
			stackOFPattern,
			RegexOptions.IgnoreCase, TimeSpan.FromMilliseconds(250));

//		string haakedPattern = @"^(?!\.)(""([^""\r\\]|\\[""\r\\])*""|"
//			+ @"([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?<!\.)\.)*)(?<!\.)"
//			+ @"@[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$";
//
//		bool validFormat |= Regex.IsMatch(email,
//			haakedPattern,
//			RegexOptions.IgnoreCase, TimeSpan.FromMilliseconds(250));

		if (!validFormat)
			return false;
		//validate the top level domain from the IANA list.
		var tld = domainName.Split(new[] { '.' }).Last();
		return tldNames.Contains(tld.ToUpper());
	}
	catch (RegexMatchTimeoutException)
	{
		return false;
	}

}

string[] FetchIANARootDb()
{
	try
	{
		using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
		{
			//client.DownloadFile("https://data.iana.org/TLD/tlds-alpha-by-domain.txt", @"C:\temp\localfile.html");
			// Or you can get the file content without saving it
			string text = client.DownloadString("https://data.iana.org/TLD/tlds-alpha-by-domain.txt");
			var lines = text.Split(new[] { '\n' }).Skip(1).ToArray();
			lines.Count().Dump("Domain name count");
			return lines;
		}
	}
	catch
	{
		return default;
	}
}

Ok, crappy formatting aside (I may get round to prettying that up) cut-n-paste this into LinqPad and run it.

In the end, it’s doubtful that it’s worth validating at all. The true validation is whether or not all of the mail servers between you and the service handle the mail correctly and whether or not you receive a response for signing up. Perhaps a test email button is a better (and more reliable) option! I wonder if BT’s developers can manage that.

This entry was posted in Programming-W-T-F. Bookmark the permalink.