Regular expressions and PostgreSQL

Tuesday 30th June 2009

There are two areas of programming I have become lapse with over recent years: one is SQL and the other Regular Expressions. At University I tailored my degree towards database management taking both database and data mining modules in my final year. However I admit I have become lazy recently and have resorted to using sub-queries more and more when I should be using joins. This isn’t a problem most of the time but they are incredibly slow and when working on large PostgreSQL databases, like I do at work, the difference in speed becomes very noticeable! I am now putting a stop to this and getting to grips with (left outer) joins again.

As well as addressing the sub-query issue to speed up queries I have been looking into regular expressions in PostgreSQL. RegEx can be extremely powerful when used correctly. A request I receive frequently is to pull all customers out of a database based on their postcode. I started out doing:
SELECT email FROM contact WHERE postcode IN ('L1', 'L2', 'L3');
This works fine if the number of postcodes is small but what happens if you need everyone in the ranges L1-L99 and M1-M60 and CH1-CH56. As this query structure no longer becomes a viable solution I use RegEx. To retrieve all people in these postcode ranges you could do something like:
SELECT email FROM contact WHERE postcode ~* '^(L|M|CH)\\d{1,2}'
A much smaller query. Notice the use of ‘\\d’ – if you only use ‘\d’ rather than matching against a digit it will match against the letter d.

Or how about returning all customers that have a french email address:
SELECT email FROM contact WHERE email ~* '.*\.fr$'

Note: this doesn’t check if the email address is valid – only if it ends in .fr. Also you need to use the tilde operator (~) rather than an equals sign. The asterisk (*) is used when you want a case-insensitive search.

So to put both a left outer join the regular expression to the test:
SELECT contact.email FROM contact LEFT OUTER JOIN suppression
ON contact.id = suppression.contact_id
WHERE contact.postcode ~* '^(L|M|CH)\\d{1,2}'
AND suppression.contact_id IS NULL

This will return email addresses of customers whose postcodes are in the desired range and are not suppressed.

Tags: , , ,

Contact

Twitter Linked In Facebook Flickr last.fm Gmail