Parse html with PHP, a preg_match_all tutorial
For the most of the web developer which are using preg_match or preg_replace frequently is the function preg_match_all a smaller advantage, but for all others it’s maybe hard to understand. The biggest difference between preg_match_all and the regular preg_match is that all matched values are stored inside a multi-dimensional array to store an unlimited number of matches. This preg_match_all tutorial is about how to “collect” the image source values inside a web page:
$data = file_get_contents("http://www.finalwebsites.com"); $pattern = "/src=[\"']?([^\"']?.*(png|jpg|gif))[\"']?/i"; preg_match_all($pattern, $data, $images);
Let’s take a closer look on the regular expression pattern:
The first part and the last part are searching for everything that starts with src and ends with a optional quote or double quote. This could be a long string because the outer rule is very global. Next I check the rule starts within the first bracket:
Now I will test inside the long string from the outer rule for strings starting with an optional quote or double quote followed by any characters. The last part inside the inner brackets is the magic:
Next I will test for a string that is followed by a file extension and if there is a match I will retrieve all the paths from the HTML file.
I need all the rules to isolate the string parts (image paths) from the rest of the HTML. The result looks like this (access the array $images with these indexes, or just use print_r($images)):
$images -> src="/images/english.gif"
$images -> /images/english.gif
$images -> gif
The index  is the information I need, try this preg_match_all example with other parts of HTML code and experiment for a better understanding. Check my website finalwebsites.com for more PHP scripts and code examples.