Php regular expressions for newbies
Regular expressions are a strong and flexible way to define string patterns. An usual string belongs to a pattern when it has a certain structure. For example if we want all the strings that begin with two letters and end with six digits we can write the pattern using a regex like in the example below:
^[A-Z]{2}[0-9]{6}$
This is all we have to know. It seems to be difficult but as soon you understand how it works it will be quite easy. So, the “^” character mark the begin of the string and the “$” symbol means we reached the end of the string. Between these characters we will write our expression.
Between “[]” brackets is the sequence of characters. If we want our expression contains only digits we can put all the digits between the brackets, or simple we can write [0-9] . If we want small letters we write [a-z] and for capital letter we use [A-Z]. We can combine small and capital letters, digits, “-” and ” “(space) character this way: [a-zA-Z0-9_ ].
Between “{}” brackets we have the maximum numbers of characters for the previous sequence. [A-Z]{2} means we have a sequence with 2 capital letters. We can also set the length limit this way:
- {2,5} - the sequence has more than 2 characters and less than 5 characters
- {2,} - the sequence has more than 2 characters
- {,2} - the sequence has less than 2 characters
Lets have another expression: ^([a-zA-Z]+)[/]*$
Here we have unknown operators like “+” which is equivalent with {1,} and “*” equivalent with {0,}. Instead of “*” we can use “?” with the difference that “?” refers to the previous character. The slash character is used for escape character.
We can simple find words to have this aspect: f[digit or x] using f[0-9x]{1}
If we want strings to have the configuration f[two digits][any character] we can use the f[0-9]{2}. regex. The “.” character means any character less line breaks.
The “()” brackets are used to group more sequences.
Below are special characters:
- * zero or more characters
- + one or more characters
- \d one letter
- \w one alphanumeric plus underscore
- \s white space (including line breaks and tabs)
- \t tab
- \n new line (\r\n for Windows)
- . any character less line breaks
I’d like to make an exercise to determine all the email addresses. Here is the regex for an email address: [a-zA-Z0-9_.-]+@[a-zA-Z0-9_.-]+\.[a-zA-Z]{2,4}+
The first [] is for the part before the “@” character. I used “+” because the sequence must have at least one character (letter, digit, -, . or _). Next is a character “@” and the same sequence for the domain name . Then we have a “.” “escaped” because it is not in a “[]” sequence. Next is the domain type which may have only 2-4 letters.
Lets see how we obtain this using PHP.
$string = ’simone@d-d.com
$emails = array();
We have the strings in which we look and the array where we put the found addresses. Now we call the function:
preg_match_all(’/[a-zA-Z0-9_.-]+@[a-zA-Z0-9_.-]+\.[a-zA-Z0-9_]{2,4}+/’, $string, $emails, PREG_SET_ORDER);
This function has three flags:
- PREG_SET_ORDER returns an array with the values of the found brackets
- PREG_OFFSET_CAPTURE returns the number of character where the bracket begin
- PREG_PATTERN_ORDER returns a simple array with all the found brackets
preg_match() will return the match number and FALSE in case of error.
Note strpos() and strstr() are faster then preg_match.
Success ![]()

RSS/XML