Until now, the preferred way to selectively allow only certain HTML tags like <b> and <i> was to regex the input to ensure it contained only valid Unicode letter and number characters and those specified tags, something like this:
if (!Regex.IsMatch(input, @"^([\p{L}\p{N}'\s]|<b>|</b>|<i>|</i>){1,40}$")) throw new Exception();
This approach will prevent all unwanted tags, but it will also prevent all attributes on the allowed tags. Sometimes this is good – attackers can add malicious script to onmouseover attributes of <b> and <i> tags – but again, sometimes this is overkill and blocks the use of benign attributes like lang or title. It would be theoretically possible to extend the regular expression to allow these attributes, as well as other safe HTML tags and their attributes, but realistically that would be an incredibly difficult regex both to develop and maintain.
AntiXss 3.1 takes care of all of this logic for you, using the same whitelist approach: it filters the input using a list of known good tags and attributes and strips out all other text. Simply pass the untrusted input through the AntiXss.GetSafeHtml or GetSafeHtmlFragment method to sanitize it:
string output = AntiXss.GetSafeHtml(input);
I strongly encourage everyone to download the new AntiXss 3.1 and incorporate it into your applications starting today. It’s a very effective defense, especially when used in conjunction with the output encoding functionality that’s been a part of AntiXss from the beginning.
Read the Full Article Here.
Download AntiXss 3.1 from Microsoft.