24 thoughts on “Parse html with PHP, a preg_match_all example

  1. I modified it to pull the image path out of “content” brougt in from a db. It works fine if only on image is in the “content”, when I tried with 2 images the pattern grabs from src in the first image until the second image’s ending ‘”‘. I have tried a lot of different variations, but nothing just pulls the paths for the image. Do you have any ideas?

  2. @geredfds,

    this is about 3 short rows of code…?

    Please replace the single and double quotes using your keyboard (if needed).

  3. Hi, can you post a plaintext version of this code? The formatting of your quotes seems to be stopping it working?

  4. Hi,

    not that WP will replace some chars, check if the single and double quotes are the right one.

  5. when I enter this the php script


    prints the following error:

    Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING

    What I am writing wrong?

  6. :)

    I see the problem, I unescaped the double quotes too but WP has stripped them off!

    Thanks Iiro for pointing me on that!

  7. @stelabouras:

    add backslaches before the double quotes:


  8. The code doesnt work for single quotes (the closing quote doesnt work properly)


  9. Hi Max,

    just did a (second) test, but it’s working fine, do you have an example page (URL) where it’s not working?


  10. put the code in as such and i get error

    Parse error: syntax error, unexpected T_CONSTANT_ENCAPSED_STRING in /home/hardwork/public_html/imageparse.php on line 3

    any help please?

  11. Hello George,

    the problem is php code parse for this WordPress blog :(
    I removed the double black-slashes and the error is gone, but if you copy paste the code above it’s possible that you have to replace the single/double quotes.

  12. Hi,
    your question is about single quotes used inside the image tag?

    the above pattern matchs:

    I hope that helps

  13. Does this code work for single quotes??

    I have tried to use this code, it does not work for me for single ones.

    Please help me with this one, really urgent. Cant find a way to allow both double quotes and single quotes.

  14. Olaf,

    Been browsing through your site and again thanks for the tutorials. Didn’t know web development existed this long. Has just bookmarked your site, hope you’ll help cos I on my way to create a better wp theme for my site.

    • You’re welcome Charleston, sure web development exists as long as the web 😉

  15. I need help to extract the url which does not contain the primary domain.

    • Hi ssultan,

      which part of the URL do you need? Please provide some examples (or use the function parse_url())

  16. Hy ssultan
    Try this:

    $regexp = ‘#href=[“|’]http://(.[^/]+)([^”|’]+)#i’;
    $url = ‘a hyperlink’;

    if (preg_match($regexp, $url, $matches)) {
    echo ‘Domain: ‘. $matches[1];
    echo ‘Rest: ‘. $matches[2];

    • Hi,

      your example doesn’t work since your example URL is not valid 😉

      @ssultan, if you need to extract the domain or hostname from some URL, try parse_url()

  17. hello, thank for sharing.

    i want to get one of my html. My code like that:
    $url = “index.html”;
    $content = file_get_contents($url);
    $patten = ‘/(.*)/’;
    preg_match_all($patten, $content, $data);

    result return empty. Help me. Thanks a lot

    • Hello Lan,
      the pattern is wrong, it’s not specific (you accept the whole string inside the $content var)

  18. Hello, very good job here!

    But, not working for me.
    I have this directory (http://familiagrissi.com/images/arvore/) and want to select only the name of the pictures with the .jpg extension, but without the extension in an array.

    the result should be something like:
    $image[0] = Adelia Grissi Borgo;
    $image[1] = Andrea BergaminiReis;

    and so on…

    thanks! 😀

    • Hi,
      It’s possible that you need to change the REGEX pattern a little. This tutorial is just an example on how-to use the function preg_match_all().
      Compare your HTML with the pattern you’re using at the moment, I’m sure you will see the difference 😉

  19. thanks fellow, u save my live..
    i was edited your code to fit my need, here it is

    $pattern = preg_match_all(‘!imgurl=.+.(?:jpe?g|png|gif)!Ui’ , $data , $images);

Comments are closed.