Web application question

M

Marek

Guest
Suppose there is an html page named http://www.thatSite.com/x.html that is generated daily by a web stat program.

You administrate a totally different site, but you have permission to dynamically extract a few small portions of that content from the x.html page to include on a page at your site. Is this possible using Cold Fusion or PHP? And what is the name for this method of "extracting data from another web page"?

Edit: The stat program does not generate XML (but maybe there's a way to convert it into XML that I don't know about)
 
Last edited:
So you're going to download the page and rip the content from it? Thats just parsing. You'll have to write the parser yourself in either language.
 
Bah...that's what I tried searching for with 10000000000 results. I'll just keep refining the search, I guess.
 
" 'cold fusion' parse external web page " etc...

And what I'm refering to isn't a one-time thing. I mean I want to code my page to rip specific content from the x.html file every time someone loads it. I found the tag I need to do it, though. Cold Fusion's CFFile tag can be set up to download external content and pass the data in ascii format into a variable.
 
Last edited:
what are you searching for? there's nothing that's just going to parse THAT specific page for you.
 
Yeah, I know I've gotta do the coding / parsing. I was just trying to find a method of making my code "get ahold" of the external data so I could code the parsing routine. I found what I needed, tho :D
 
http://us2.php.net/function.fsockopen

Code:
Here's a quick function to establish a connection to a web server that will time out if the connection is lost after a user definable amount of time or if the server can't be reached.

Also supports Basic authentication if a username/password is specified. Any improvements or criticisms, please email me! :-)

Returns either a resource ID, an error code or 0 if the server can't be reached at all. Returns -1 in the event that something really wierd happens like a non-standard http response or something. Hope it helps someone.

Cheers,

Ben Blazely

function connectToURL($addr, $port, $path, $user="", $pass="", $timeout="30")
{
       $urlHandle = fsockopen($addr, $port, $errno, $errstr, $timeout);

       if ($urlHandle)
      {
               socket_set_timeout($urlHandle, $timeout);

              if ($path)
               {
                      $urlString = "GET $path HTTP/1.0\r\nHost: $addr\r\nConnection: Keep-Alive\r\nUser-Agent: MyURLGrabber\r\n";
                      if ($user)
                               $urlString .= "Authorization: Basic ".base64_encode("$user:$pass")."\r\n";
                      $urlString .= "\r\n";

                     fputs($urlHandle, $urlString);

                       $response = fgets($urlHandle);

                       if (substr_count($response, "200 OK") > 0)      // Check the status of the link
                       {
                              $endHeader = false;                     // Strip initial header information
                               while ( !$endHeader)
                              {
                                       if (fgets($urlHandle) == "\r\n")
                                              $endHeader = true;
                              }

                               return $urlHandle;                     // All OK, return the file handle
                       }
                      else if (strlen($response) < 15)                // Cope with wierd non standard responses
                       {
                              fclose($urlHandle);
                             return -1;
                       }
                      else                                            // Cope with a standard error response
                       {
                              fclose($urlHandle);
                               return substr($response,9,3);
                       }
              }

               return $urlHandle;
       }
      else
               return 0;
}
 
Alternatively you can use:
$url = "http://yourdomain.com/file.html";
$contents = file_get_contents($url);
//$contents will hold the the contents in a string.
$tagstoleaveout = '';
$minpage = strip_tags($contents, $tagstoleaveout);
//strip_tags to get rid of tags...


Then you could use some sort of reg expression or just a search to find the part of the page you want to rip out.
 
Back
Top