Converting Dynamic Sites
to
Static Pages

 


Mon 11 Jan 2016
Static Pages
event1.html

Event: PHP Social Brunch
Date: Sat 02 Jan 2016
Venue: The Coffee Shot
event2.html

Event: Singapore PHP User Group Meetup (Jan 2016)
Date: Mon 11 Jan 2016
Venue: PayPal Singapore
Dynamic Page
event.php?id=1

Event: PHP Social Brunch
Date: Sat 02 Jan 2016
Venue: The Coffee Shot
event.php?id=2

Event: Singapore PHP User Group Meetup (Jan 2016)
Date: Mon 11 Jan 2016
Venue: PayPal Singapore
<?php /* event.php */
$conn = new mysqli('localhost', 'username', 'password'); // Database connection
if (!$conn) { die('Connection failed: ' . mysqli_connect_error()); }

$id = isset($_GET['id']) ? (int) $_GET['id'] : null;
$result = $id ? $conn->query("SELECT * FROM events WHERE id='{$id}'") : false;

if ($result !== false && $result->num_rows > 0) {
    $event = $result->fetch_object();
    printf(
        'Event: %s<br>Date: %s<br>Venue: %s',
        $event->name,
        $event->date,
        $event->venue
    );
}
Back in previous company...

  • Introduced Zend Framework for web development
  • Sites from 1997 to 2011 coded in plain HTML/PHP - about 27 sites
  • Sysadmin wanted to delete past year databases and convert pre-2012 sites to static
  • Tried out some web crawler tools, eg. wget and HTTrack, but not suitable
Why Reinvent the Wheel?

  • Downwards traversal - not crawl the entire domain repeatedly for each of the following
    • http://example.com/event1997
    • http://example.com/event1998
    • ...
  • Save only pages, not assets
  • Unusual links to follow, eg. AJAX, onclick="location.href=/", meta refresh

Code review time!

https://github.com/zionsg/standalone-php-scripts/tree/master/CrawlSite
Converted Pages
event_id-1.php

Event: PHP Social Brunch
Date: Sat 02 Jan 2016
Venue: The Coffee Shot
event_id-2.php

Event: Singapore PHP User Group Meetup (Jan 2016)
Date: Mon 11 Jan 2016
Venue: PayPal Singapore