Extract HTML Using PHP Simple HTML DOM

a HTML DOM parser let you manipulate HTML in a very easy way and supports invalid HTML.
also can find tags on an HTML page with selectors just like jQuery.

How To Extract HTML Using PHP Simple HTML DOM

Download The Code

the code is free and open source with MIT license to use it freely in your personal and commerical projects.

Download PHP Simple HTML DOM

Code Usage

let’s take this web site as an example (http://www.pluscss.com).
we should include the library first.



include('simple_html_dom.php');

Then we will define the site that we are going to extract the HTML from.



$html = file_get_html('http://www.pluscss.com');

Then, Let’s search for all images in the index of http://www.pluscss.com



















foreach($html->find('img') as $img) {
        echo $img->src . '<br>';
}
/* it will result something like :
common/images/logo.png
upload/slider/1389582835.jpg
upload/slider/1384787160.jpg
imageprocessor.php?src=./upload/products/70061389577558.jpg&w=290&h=160&q=80&mode=stretch
imageprocessor.php?src=./upload/products/75721381286020.jpg&w=290&h=160&q=80&mode=stretch
imageprocessor.php?src=./upload/products/79131381417480.jpg&w=290&h=160&q=80&mode=stretch
imageprocessor.php?src=./upload/posts/67661388955737.jpg&w=100&h=65&q=100&mode=stretch
imageprocessor.php?src=./upload/posts/25331388744591.jpg&w=100&h=65&q=100&mode=stretch
imageprocessor.php?src=./upload/posts/80581388644743.jpg&w=100&h=65&q=100&mode=stretch
imageprocessor.php?src=./upload/posts/35141388387304.jpg&w=100&h=65&q=100&mode=stretch
imageprocessor.php?src=./upload/posts/80441387953862.jpg&w=100&h=65&q=100&mode=stretch
imageprocessor.php?src=./upload/posts/10611387409811.jpg&w=100&h=65&q=100&mode=stretch
*/

let’s take another example, we will search for ul list with the class main-menu and then fetch it’s items :







foreach($html->find('ul.main-menu') as $ul) {
    foreach($ul->find('li') as $li) {
        echo $li->innertext . '<br>';
    }
}

another example to find all meta tags that have the property name (exclude the meta that don’t have the property name)











foreach($html->find('meta[name] 








') as $meta) {
    echo $meta->name.' : '.$meta->content . '<br>';
}
/*
the result will be : 
description : pluscss.com provides scripts for php sql css jquery ajax web techniques along with articles and tutorials
keywords : scripts,php,sql,css,jquery,ajax,web techniques,articles,tutorials
author : PlusCSS
generator : PlusCSS
*/

you can also read the full documentation at http://simplehtmldom.sourceforge.net/manual.htm

Leave a Comment

Your email address will not be published. Required fields are marked *