POI-OSM Automation

Tue, 18/12/2012 - 16:19 -- Remiguel

Finally, I completed the automation of data updates for my POI-OSM page. It took longer than expected. On my page POI-OSM you can easily download points of interest from OSM. This Php script allows to update all the data files, in a simple and fast way. This will allow me to keep my page up to date. Thanks to tsuji from http://www.developpez.net, to his support and solution for query nodes by its children criteria.

 

What does my Php script?

  1. To download an .osm.bz2 file from Geofabrik on my server.
  2. To read the compressed bz2 line by line.
  3. To save the compressed data discarding some nodes without children.
  4. To process the osm file to keep only nodes with children and with the needed categories.
  5. To check wether the new file is larger than the last one, before to save it.

 

1. Download the file.

// download file
$country = "switzerland";
$file = $country.".osm.bz2";

if (file_exists(("filetmp/".$file))) { // erase the file if exits
unlink ("filetmp/".$file);
}

if ($country == "germany") {
$url  = 'ftp5.gwdg.de/pub/misc/openstreetmap/download.geofabrik.de/'.$file;
} else {
$url  = 'http://download.geofabrik.de/openstreetmap/europe/'.$file;
};
    $path = 'filetmp/'.$file;
    $fdow = fopen($path, 'w');
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_FILE, $fdow);
    $data = curl_exec($ch);
    curl_close($ch);
    fclose($fdow);
    unset($fdow);

2. Read the bz2 file line by line. 

$countr_osm = "filetmp/".$country.".osm";

if (file_exists(($countr_osm))) { // erase the file if exits
unlink ($countr_osm);
}
$countr_2xml = $country."2.xml";
if (file_exists(($countr_2xml))) { // erase the file if exits
unlink ($countr_2xml);
}
$kount = 0;
$tempfile = "tmp_file.txt";

if (file_exists($countr_osm)) {
echo "OSM file exists. Not necessary to uncompress the bz2 \n";
} else {
$fp = fopen ($countr_osm,"w"); // Open a file to save the extracted data

$bz = bzopen("filetmp/$file", "r") or die("Couldn't open $file"); // Open the compressed file

3. While the bz2 is read and the osm written, the code is doing a first data cleaning, line by line by removing all the lines containing single nodes <nodes>, <nd> y <member>, without children. This process allows to shrink the uncompressed file. Without this operation the file from France is larger than 60 Gbytes. With this process the file is smaller than 10 Gbytes. By using bzread and fwrite the memory needs, is really low.

$decompressed_file = '';
while ($decompressed_file = bzread($bz, 4096)) { // read the compressed file by chunked
$ftmp = fopen ($tempfile,"w+");

if($decompressed_file === FALSE) die('Read problem');
if(bzerror($bz) === NULL) die('Compression Problem');

fwrite($ftmp,$decompressed_file, strlen($decompressed_file)); // save the chunk in a tmp file
fseek($ftmp, 0);

    while (!feof($ftmp)) {
    $line= fgets($ftmp, 1024);
         if ((strpos($line,"<nd ") !== false && strpos($line,"/>") !== false)
            || (strpos($line,"<member ") !== false && strpos($line,"/>") !== false)
            || (strpos($line,"<node ") !== false && strpos($line,"/>") !== false))
         {
         // leave this line out of the output file
         } else {
         fwrite($fp,$line);
         $kount = ++$kount;
         echo $kount."\n";
         } // close if
    } // close while
    fclose($ftmp);
    unlink($tempfile);
} // close while

bzclose($bz);
fclose($fp);
// Delete the object to free memory
unset($bz);
unset($fp);
unset($ftmp);
$kount=0;

echo "OSM file ready \n";
} // close if else

4. Again to avoid the use of large amount of memory, the script read nodes by nodes, with xmlread. Only the nodes with desire category are saved. The memory is flushed every 2000 iterations. I use a DOM object to save the entire nodes, with attributes and children in memory to save them after the checking with xmlwrite in a xml file. Osm is an xml file.

// extract only the data we need (define in array block) and save them in a new file

// the file is read node by node

$block = array('amenity', 'craft', 'emergency', 'historic', 'leisure', 'man_made', 'natural', 'office', 'shop', 'sport', 'tourism', 'aeroway', 'railway');

$xmlWriter = new XMLWriter();

$xmlWriter->openMemory();

$xmlWriter->setIndent(true);

$xmlWriter->startDocument('1.0','UTF-8');

$xmlWriter->startElement('osm');

$xmlWriter->startAttribute('version');

$xmlWriter->text('0.1');

$xmlWriter->startAttribute('generator');

$xmlWriter->text('Remiguel');

if(file_exists($countr_osm)) {

    $xml = new XMLReader(); 

    $xml->open($countr_osm); // input file as source

    $node=null;

    $kflag=false;

    while($xml->read()){ 

        if ($xml->nodeType==XMLReader::ELEMENT && $xml->name=='node') {

            $node=$xml->expand(); // copies the entire node in a DOM object

            $kflag=false;    //reset 

        }

        if ($kflag

            && $xml->nodeType==XMLReader::END_ELEMENT 

            && $xml->name=='node'

        ) {

            // read and write node attributes

            $xmlWriter->startElement('node'); 

            foreach ($node->attributes as $key => $value) {

            $xmlWriter->startAttribute($key);

            $xmlWriter->text($value->value);

            }

            // read and write tag attributes

            $tag = $node->getElementsByTagName("tag"); 

            foreach($tag as $d){

            $xmlWriter->startElement('tag');

            $xmlWriter->startAttribute('k');

            $xmlWriter->text($d->getAttribute('k'));

            $xmlWriter->startAttribute('v');

            $xmlWriter->text($d->getAttribute('v'));

            $xmlWriter->endElement(); // close tag

            }

            $xmlWriter->endElement(); // close node

            // echo_memory_usage();

            echo $kount."\n";

            if ($kount > 2000) {

   file_put_contents($countr_2xml, $xmlWriter->flush(true), FILE_APPEND);

   $kount=0;

   }

            $node=null;

            $kflag=false;

        }

        

        foreach ($block as $cat) {

        if (!$kflag

            && $xml->nodeType==XMLReader::ELEMENT 

            && $xml->name=='tag' 

            && $xml->getAttribute('k')==$cat

            && $xml->getAttribute('v') // maybe this line can be removed

        ) {

            $kflag = true;

            $kount = ++$kount;

        }

        }

    } // close while

} // close if

$xmlWriter->endElement();

$xmlWriter->endDocument();

file_put_contents($countr_2xml, $xmlWriter->flush(true), FILE_APPEND);

unlink ($countr_osm);

unlink ("filetmp/".$file);

// Delete the object to free memory

unset($xmlWriter);

unset($xml);

5. We compare the new xml file against the old one. If the new one is larger, we assume the process went to the end and the file is saved to replace the old one:

if (file_exists(($country.".xml"))) {
    if (filesize($countr_2xml) > filesize($country.".xml")) {
    unlink ($country.".xml");
    rename($countr_2xml, $country.".xml");
    echo 'Update of '.$country. ' done';
    } else {
    unlink($countr_2xml);
    echo 'The file ' .$country. ' is already up to date';
    }
} else {
rename($countr_2xml, $country.".xml");
echo 'New file created: '.$country;
}
 
This code will be completed with a foreach loop to take into account all the country we need and to be execute monthly by cron.
 
Attachment(s): 

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.