A common annoyance when dealing with user-supplied content is the way MS Word uses some non-standard character encodings (at least non-standard in terms of the web). Among others, these include the directional (a.k.a. "smart") quotes. The problem occurs when you output text that contains those characters as a result of a user copying and pasting text from a Word document. Typically they are not interpreted by the browser and the font being used, resulting in the dreaded place-holder characters (question marks, boxes, etc.).
If outputting the UTF-8 character encoding in your PHP pages, I came up with the following PHP function to help deal with this. It is inspired by this comment on Chris Shiflett's blog. It is simply a use of the str_replace() function to replace some known problem characters with character entities that should work better when outputting UTF-8 content.
<?php
function filterText($text)
{
$search = array (
'&',
'<',
'>',
'"',
chr(212),
chr(213),
chr(210),
chr(211),
chr(209),
chr(208),
chr(201),
chr(145),
chr(146),
chr(147),
chr(148),
chr(151),
chr(150),
chr(133)
);
$replace = array (
'&',
'<',
'>',
'"',
'‘',
'’',
'“',
'”',
'–',
'—',
'…',
'‘',
'’',
'“',
'”',
'–',
'—',
'…'
);
return str_replace($search, $replace, $text);
}
// USAGE:
header('Content-Type: text/html; charset="UTF-8"');
echo filterText($test);
?>