Filtering MS Word Text

HTML, PHP No Comments

A common annoyance when dealing with user-supplied content is the way MS Word uses some non-standard character encodings (at least non-standard in terms of the web). Among others, these include the directional (a.k.a. "smart") quotes. The problem occurs when you output text that contains those characters as a result of a user copying and pasting text from a Word document. Typically they are not interpreted by the browser and the font being used, resulting in the dreaded place-holder characters (question marks, boxes, etc.).

If outputting the UTF-8 character encoding in your PHP pages, I came up with the following PHP function to help deal with this. It is inspired by this comment on Chris Shiflett's blog. It is simply a use of the str_replace() function to replace some known problem characters with character entities that should work better when outputting UTF-8 content.

<?php
function filterText($text)
{
   $search = array (
      '&',
      '<',
      '>',
      '"',
      chr(212),
      chr(213),
      chr(210),
      chr(211),
      chr(209),
      chr(208),
      chr(201),
      chr(145),
      chr(146),
      chr(147),
      chr(148),
      chr(151),
      chr(150),
      chr(133)
   );
   $replace = array (
      '&amp;',
      '&lt;',
      '&gt;',
      '&quot;',
      '&#8216;',
      '&#8217;',
      '&#8220;',
      '&#8221;',
      '&#8211;',
      '&#8212;',
      '&#8230;',
      '&#8216;',
      '&#8217;',
      '&#8220;',
      '&#8221;',
      '&#8211;',
      '&#8212;',
      '&#8230;'
   );
   return str_replace($search, $replace, $text);
}

// USAGE:
header('Content-Type: text/html; charset="UTF-8"');
echo filterText($test);
?>

UTF8 in PHP and MySQL

PHP 3 Comments

The intent of this article is to tie together some things I've learned to do in order to get my web apps to "play nicely" with the UTF8 character set. Before we go any further, let me state that I do not claim to be an expert on this; the following is simply a collection of things I've discovered here and there on the web, and which together seem to help smooth out most of the bumps in the road of using UTF8.

So let's start with the database itself. To get your varchar and text fields talking UTF8, you should assign both the character set and a corresponding collation. (See the MySQL manual section on character sets and collations to see the differences between the various collation types.) You can assign this stuff at the field level should you desire, but generally I just assign it at the table level:

Read the rest...

Calculate Age: One-Liner Fun

PHP 1 Comment

Here's a little one-liner I thought up today for this PHPBuilder forum thread. It's purpose is to calculate someone's age when you know the year, month, and day of their birth (integer values). In this snippet it is assumed that $year, $month, and $day hold the integer values for the birthday of interest.

$today = time();
for($yr = $year, $age = -1; mktime(0,0,0,$month,$day,$yr) < $today; $yr++, $age++);
echo $age;

This utilizes the often ignored fact that you can use comma-separated statements for the first and third expressions in the for loop definition list.

Using Akismet to Detect Spam Email

PHP 1 Comment

After seeing the effectiveness of the Akisment WordPress plug-in at filtering out spam comments here, I decided to see if I could use it in conjunction with a email contact form. I thought it might be interesting to some of my readers (there are at least a couple) to keep a sort of journal here of what I do to accomplish that, plus it might help encourage me to finish it.

So the first thing I've done is to create an Akismet class that can take the pertinent data, contact the Akismet server via cURL and send it that data, and then return the response as a boolean (true == spam, false == not spam).

Read the rest...

Tabbed Ouput with Tidy

HTML, PHP 3 Comments

In response to this thread at WebDeveloper.com, I came up with the idea of using PHP's Tidy functions to format the HTML output from a script. The basic idea was to capture all the output by using ob_start() to buffer the output and then ob_get_clean() to save it to a variable. Then just run it through the tidy_repair_string() function with a couple configuration settings to indent it.


<?php
ob_start
();
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>test</title>
</head>
<body>
<h1>Test</h1>
<ul>
<li>This is a test.</li>
<li>It is only a test.</li>
</ul>
</body>
</html>
<?php
$text
= ob_get_clean();
$config = array(
'indent' => true,
'indent-spaces' => 4
);
$text = tidy_repair_string($text, $config);
echo
$text;

But Read the rest...

Securing Uploaded Image Files

PHP 4 Comments

I just saw this post by "jazz_snob" posted at PHPBuilder.com, suggesting a means to secure untrusted image files. The basic idea is to use PHP's GD image functions to create a copy of the file. As doing so would decompose the specified file into GD's native bitmap format, and then recompose it into the desired image file type, any embedded "nastiness" within the original file ought to be left behind. It could be implemented into a function something like:

<?php
/**
 * Copy an image to help ensure it is not "infected"
 * @author Charles Reace (www.charles-reace.com)
 * @param  string    path to image file to be copied
 * @return resource  GD image resource, boolean false if error
 */
function secureImage($filePath)
{
   $sizeData = getimagesize($filePath);
   if($sizeData === false)
   {
      user_error(__FUNCTION__ . "(): Unable to get imsge data");
      return false;
   }
   list($unused, $type) = explode('/', $sizeData['mime']);
   switch($type)
   {
      case 'gif':
         $fh = imagecreatefromgif($filePath);
         break;
      case 'png':
         $fh = imagecreatefrompng($filePath);
         break;
      case 'jpeg':
         $fh = imagecreatefromjpeg($filePath);
         break;
      default:
         user_error(__FUNCTION__ . "(): Unsupported image type '$type'");
         return false;
   }
   return $fh;
}

// Sample usage:
$fh = secureImage('bg.gif');
if(!$fh)
{
   header('HTTP/1.0: 404 Not Found');
   exit;
}
header('Content-Type: image/gif');
imagegif($fh);

If any of you readers happens to have access to an "infected" image file and a safe sandbox where you could test the above, I'd be very interested to know if it does, in fact, filter out the non-image virus or whatever is embedded, or at the very least reject it with an error.

Application Constants in Interfaces

PHP No Comments

Here's a little trick I discovered the other day for passing application settings around in an object-oriented implementation. You can create an interface that defines any number of class constants, then any class you define that needs those constants needs only to implement that interface. For example:

<?php
/**
 * Define constants for use in other classes
 */
interface Constants
{
   const DB_HOST 'localhost';
   const DB_USER 'username';
   const DB_PASS 'abc123xyzr';
   const DB_NAME 'test';
}

/**
 * Database class based on MySQLi class
 */
class DB extends mysqli implements Constants
{   
   public function __construct()
   {
      parent::__construct(
         self::DB_HOST,
         self::DB_USER,
         self::DB_PASS,
         self::DB_NAME
      );
   }
}

The advantage of this over running some configuration script that sets the constants is that by implementing an interface, it becomes immediately visible that the class requires that interface. If on the other hand you depend on independently setting constants in an include file or such, then if you try to reuse a class that uses those constants, it will not be immediately obvious that they are needed until you start testing it in the new implementation.

The main limitation is that interfaces cannot have class variables, only constants. If you need application-wide variables for your object-oriented application, you'll either need to instantiate a class that has those variables and pass it to each object that needs it, use a singleton pattern class (which can have the same disadvantage of globals in that the fact that it is required can be hidden inside a class), or look into something like using a registry pattern class.

« Previous Entries