PHP最简单的爬取数据
话不多说直接上代码,这里抓的是飞猪的数据,可以实现分页,和总页数,更多的数据可以自己定义表达式截取数据,只要浏览器能看到的都能获取,其实就是获取html代码通过正则表达式分隔得到最终的数据:
public function xs() { $content = file_get_contents('https://travelsearch.fliggy.com/index.htm?searchType=product&keyword=%E4%BA%91%E5%8D%97&pagenum=1'); $pos1 = strpos($content, '<div class="page-products-block-left clear-fix">'); $pos2 = strpos($content, '<span class="page-total">到第<input type="text" class="page-skip">页<span type="button" class="confirm-btn" data-spm-click="gostr=/tbtrip;locaid=dredirect">确定</span></span>'); $content = substr($content, $pos1, $pos2 - $pos1); // href preg_match_all('/<img alt="" class=\"lazy-image\".*? data-src="(.*?)".*?/si', $content, $matches); // <img alt="" class="lazy-image" data-src="" data-lazyid="44" style="transition: opacity 100ms ease 0s; opacity: 1;" src=""> $href = array_values(array_unique($matches[1])); // src // preg_match_all('/_src=\"(.*?)\"/i', $content, $matches); preg_match_all('/<h3 class=\"main-title\">(.*?)<\/h3>/i', $content, $matches); $title = $matches[1]; // title // preg_match_all('/title=\"(.*?)\"/i', $content, $matches); // $title = $matches[1]; // price preg_match_all('/<span class=\"price\".*?><em>¥<\/em>(.*?)<\/span>/i', $content, $matches); preg_match_all('/<div class=\"price-box\".*?>(.*?)<\/div>/i', $content, $matches); $price = $matches[1]; preg_match_all('/<span class=\"tag-value\".*?>(.*?)<\/span>/i', $content, $matches); $tag = $matches[1]; // print_r($tag); // return 1; $data = array(); for ($i = 0, $len = count($href); $i < $len; $i++) { $data[] = array( 'href' => $href[$i], // 'src' => $src[$i], 'title' => $title[$i], 'price' => htmlentities($price[$i], ENT_QUOTES, "UTF-8"), 'tag' => $tag[$i], ); } print_r($data); }
下面是获取的数据截图:
原创文章,转载请注明:PHP最简单的爬取数据 | 知识改变命运
转载请注明出处: 知识改变命运 » PHP最简单的爬取数据