archive:snipsnapconvert [Georg Martius]

A PCRE internal error occured. This might be caused by a faulty plugin

====== Differences ====== This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+Here I will describe how I converted my old website (www.flexman.homeip.net) that used [[http://www.snipsnap.org|SnipSnap]] to an offline copy ([[http://georg.hronopik.de/snipsnap|see here]])
+and to DokuWiki (this page)
+===== Making an offline copy of SnipSnap =====
+Of course the tool of choice is ''wget''. However it is not so trivial because
+ SnipSnap uses the same name pages and folders for subpages. The hierarchical structure is similar to the one of dokuwiki. The access is pretty easy because you only give the name of the page in the url.
+If you use wget right away it will fail because it cannot create a folder and a file with the same name. Fortunately wget comes with the appropriate option ''--html-extension'' which adds to all pages an ''html'' file ending. Here the entire call
+  wget --mirror --convert-links --backup-converted \
+       --html-extension www.flexman.homeip.net
+This will give you an entire copy of both the pages and the raw content and the diffs.
+The raw content is the content of your snips as you typed them, meaning in wiki syntax. This is available through the ''view'' button. Somehow I don't see it on my mirrored pages now. Mh. Anyway, if it does not work for you then you can still access them with ''/raw/snipname''.
+So one does not need the fiddle around with cookies to get the logged in pages. By the way I tried, but did not manage to fake snipsnap with a stored cookie from my logged in session. Probably it checks the browser id or something else as well.
+Now, I only had to change a few things like the logo and so on.
+===== Convert SnipSnap to Dokuwiki =====
+Since we now have all the pages in raw format in one folder (which should be called ''raw'') we can process them with a small script to convert them to dokuwiki syntax.
+Before we can start we have to convert all files to unix file format:
+<code bash>
+mkdir raw_unix;
+cd raw;
+export IFS='
+'; # this avoids trouble with spaces in filesnames
+for F in `find -type f`; do dos2unix < "$F" > "../raw_unix/$F"; done.
+cd ..
+</code>
+Now you can use the following perl script (download {{snipsnap2dokuwiki.pl}}) to convert the radon wiki syntax to the dokuwiki syntax. The script does not cover everything and has also some bugs, but for the majority of content it works.
+This lines will apply the script to all files in raw_unix and change their filenames because Dokuwiki does not allow special characters (except ''-'' and ''_'' AFAIK) and requires lower case:
+<code bash>
+mkdir raw_doku;
+cd raw_unix;
+for F in `find -type f`; do
+  K=`echo $F | tr -d "()+" | perl -e "print lc <>;"`; #lower case without
+  ./snipsnap2dokuwiki.pl < "$F" > "../raw_doku/$K";
+done
+cd ..
+</code>
+Unfortunately all files are in one directory and the hierarchical structure is lost. I was anyway restructuring, that is why I did this by hand.
+Anyway now you can copy the files to you Dokuwiki webspace under
+''dokuwiki/data/pages'' with the appropriate subfolder. Make sure the permissions are the same as of the other files there to allow editing. That's it! You don't have to register the pages somewhere - Dokuwiki is really great in this respect!
+Finally here the code of the converter {{snipsnap2dokuwiki.pl}}:
+<code perl>
+#!/usr/bin/perl -w
+use strict;
+# usage as a filter
+my $codeblock=0;
+while(<>){
+  #inline code elements
+  s/\{code[:]?(.*)\}(.*?)\{code\}/<code $1>$2<\/code>/;
+  # are we in codeblock?
+  if($codeblock){
+    if(/\{code\}/){
+      print "</"."code>\n";
+      $codeblock=0;
+      next;
+    }
+  }
+  #codeblock
+  if(/\{code[:]?(.*)\}/){
+    $codeblock=1;
+    $_=$1;
+    s/none//;
+    print "<code ". $_ . ">";
+    next;
+  }
+  # headings
+  s/^\s*1.1.1 (.*)/===$1===/;
+  s/^\s*1.1 (.*)/====$1====/;
+  s/^\s*1 (.*)/=====$1=====/;
+  #bold, italics
+  s/__(.*?)__/**$1**/g;
+  s/~~(.*?)~~/\/\/$1\/\//g;
+  s/--(.*?)--/<del>$1<\/del>/g;
+  #lists
+  s/^([\s]*)[-\*] (.*)/  * $2/;
+  s/^([\s]*)[1aAiI]. (.*)/  - $2/;
+  s/^([\s]*)[-\*]{2}? (.*)/    * $2/;
+  s/^([\s]*)[1aAiI]{2}?\. (.*)/    - $2/;
+  #anchors ( do not exist, are done automatic on sections)
+  s/\{anchor:.*?\}//;
+  #links
+  # Internal Link with name
+  s/\[(.*?)\|(.*?)\/(.*?)\]/\[\[$2:$3|$1\]\]/g;
+  s/\[([^:]*?)\|(.*?)\]/\[\[$2|$1\]\]/g;
+  s/\[([^|]*?)\/([^|]*?)\]/\[\[$1:$2\]\]/g;
+  s/\[([^|^:]*?)\]/\[\[$1\]\]/g;
+  #external links
+  s/\{link:(.*)\|(.*)\}/\[\[$2|$1\]\]/g;
+  s/\{link:(.*)\}/$1/g;
+  print $_;
+}
+</code>
+{{tag> web}}

Back to top

archive/snipsnapconvert.txt · Last modified: 17.01.2009 14:37 (external edit)