The text, pictures and other items in a web page placed in the body tag of the HTML document that provide readable information that are relevant to the needs of site visitors are called content. They usually require modifications and amendments from time to time and the frequency depends on the type and level of activity going on in the organization of the site’s owner. The task of effecting these changes on content is known as content management which can affect part or the whole of a web page at the instance of removal, addition, and editing of the existing content. Being a good and somehow inevitable practice of continuously updating web sites, there is the need to create a structure for content management of a site. This structure is known as Content Management System (CMS). Content Management System can take two forms- the structure that uses HTML templates and the structure that uses database-driven site model.
Content Management System that relies on HTML templates is structured to perform changes to a page off-line followed by uploading to the appropriate folder of the site. The content and the HTML codes are interspersed. For Content Management System that uses database-driven site model, the content is separated from the HMTL file. When a site is database-driven, the content printed on the browser is from the database and not from the static HTML codes put in the page prior to uploading. Thus, when dealing with new content in a database-driven site model, two relevant issues need to be considered- content submission and content printing.
1. Content submission – You submit new content via forms. Because of the separation of HTML document from content in database-driven site, you may have to provide for content formatting before submission especially if it is a long text. Some Content Management System use form with ready-made tools for direct formatting of the text such as boldface, underline, italics etc. It is possible for you to create custom tags and put them in the appropriate places in your text if you are using ordinary form lacking formatting facilities. Such tags enable you create paragraphs, boldface, underline etc. This is where the use of regular expressions come into play for the purpose of interpreting those custom tags for formatting on the browser.
Regular expressions (Regexes) are essentially a declarative language for (string) pattern matching. Using regular expressions, we can:
- See if a string matches a specified pattern as a whole.
- Search within a string for a substring matching a specified pattern.
- Extract substrings matching a specified pattern from a string.
SQL SELECT statement will look for where the pattern in its WHERE clause matches the string in the specified table column(s) e.g. SELECT * FROM news WHERE content like ‘%passion%’. Its functionality is limited to matching pattern with an entire string. Regular expressions go further than that by matching patterns with substrings in a string and that is why it is a very fascinating and powerful tool for searching long text.
There are various types of regular expressions each with its set of functions. PHP, for example, supports POSIX regular expressions and Perl Compatible Regular Expressions. POSIX is the acronym for Portable Operating System Interface. Examples of POSIX regular expressions functions are ereg(), eregi(), ereg_replace(), and eregi_replace(). Some PERL functions are preg_match(), preg_match_all, and preg_replace().
2. Content printing – Submitted content to a database table will contain custom tags like [PA] for paragraph and [UND] for underline, positioned in the appropriate places in the text. They are better created the way you will understand. The script that will retrieve and print on the browser the formatted content will contain the appropriate regular expressions. The regular expression will look for where you have every occurrence of the custom tags in the text and convert them to their HTML tags before printing like [PA] conversion to HTML paragraph tag. There are regular expression functions that can detect such custom tags in the database and upon matching them, they will replace them by the appropriate HTML tag e.g. eregi_replace(‘[PA]’,paragragh tag), eregi_replace(‘[/PA]’,paragragh tag).
Cases like news and job vacancies that are frequently submitted can be managed using database-driven site model. When you have multiple records of submitted news or job vacancies, you can control the selection of those that can be printed on the browser. You simply submit a value of zero for a status field in the database table for every content that you post. The WHERE clause of your SQL SELECT statement in the content retrieving script will have a value of 1 for the status field e.g. SELECT * from news where status = 1. By so doing, it is only those records whose status have been changed from zero to 1 by the authorized person that will be displayed. If a vacancy has expired, the status is changed from 1 to zero and it disappears leaving others on the screen. In this way, a web page content is changed dynamically.
The power of regular expressions in matching patterns with substrings in a string is exploited in applications other than just formatting web page content with HTML tags dynamically. It is also applied in areas like validating email address and search for all the occurrences of a word or phrase in a very long text like web-based bible or dictionary. For instance, the eregi_replace() function can search for all the occurrences of Jesus or Jesus Christ in the bible. With properly developed script and well designed database, the presentation of the search result can show text of the verses containing the matched words or phrases in the bible and highlight them. The script can further create a hyperlink containing the URL of the page with the full text of the verses. Google search result is a good example of this type of presentation.
Content Management System (CMS) should be developed in such a way that the managers will not be required to have knowledge of HTML codes. The system should be well secured especially where you have the interphases to edit and remove contents. Accessibility should be by logging in of the authorized persons.