Wednesday, June 13, 2007

4199.aspx

How to remove accents from strings in .NET

A friend asked how to remove accents from strings in .NET 2.0. I found the code below on Michael Kaplan's blog.


The code uses String.Normalize() to get a normalized Unicode representation of the string where the base character and the accents are stored separately. It then loops on each character and ignores the accent mark characters so "àáåæéèøÜü" becomes "aaaæeeøUu".


public static String RemoveDiacritics(String s)       


{


    String normalizedString = s.Normalize(NormalizationForm.FormD);


    StringBuilder stringBuilder = new StringBuilder();


 


    for (int i = 0; i < normalizedString.Length; i++)


    {


        Char c = normalizedString[i];


        if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)


            stringBuilder.Append(c);


    }


    return stringBuilder.ToString();


}

No comments:

Post a Comment