Cheap and low power web server with local Wikipedia
Building a super low power web server and WiFi access point, with a full 50 Gb copy of wikipedia, for when you have no Internet access.
Portable offline wikipedia
I have always been amazed by the Wikipedia project, and how it
has managed to capture so much of the world's information. Around 2005, you could get archive files containing a full copy
of the entire text of Wikipedia that fit onto a 4 Gb SD card. These 4 Gb SD cards were quite expensive at the time,
but I thought it was amazing that I could download the entire Wikipedia, put it into my
Palm Tungsten T3, and then browse Wikipedia while
flying on a plane far away from any Internet access. I had access to the entire world's information and it was stored locally
in my device ... this was basically the first actual implementation of the Hitchhiker's Guide to the Galaxy.
Kiwix
Nowadays, the full Wikipedia archive from Kiwix including
pictures is 59 Gb, and growing every day. The text-only version is 19 Gb, which is much larger than the 4 Gb image I used over
10 years ago. Most phones don't have enough space to store either of these files, and don't contain a slot for an SD card.
There are nice apps such as Kiwix for Android (open source) that can browse a local
ZIM file, but today's phones just don't have the storage space.
LinkIt Smart 7688 Duo
So I thought perhaps a better solution would be to build a tiny web server that can run off a battery, and then I could
browse it using a phone, laptop, or E-paper device like a Kindle. The Kiwix project actually supports something called
kiwix-serve, which takes a ZIM file and creates a web server. However,
I needed a device which could run off a battery, and be also cheap and very small. A Raspberry Pi 3 would do it, but it is physically
large, costs around $50, and can consume lots of power. The Raspberry Pi Zero is smaller and cheaper, but doesn't include WiFi,
and I didn't want extra adaptors plugged in via USB. At the time I started this project, the Raspberry Pi Zero W did not exist.
However, I found the perfect solution, a
LinkIt Smart 7688 Duo which uses a
Media Tek MT7688 MIPS processor at 580 MHz with 128 MB memory, a Micro SD slot, and runs OpenWRT Linux. It is also the
size of my thumb, and had nothing extra that would make it larger than necessary. It was perfect for what I wanted to do.
The only problem is that kiwix-serve
is quite a large piece of code, and the build system is super complicated and only binaries for ARM and x86 are pre-built. Also,
I was concerned that kiwix-serve would need more than 128 MB of memory to run it, so it might not even work if I ported it.
LinkIt Smart 7688 Duo (SD card slot on back)
zimHttpServer
Instead of trying to use Kiwix, I looked for other options. I was hoping for a web server written in an interpreted language
that was supported by OpenWRT out of the box. The only solution I found which looked like it would work was
zimHttpServer, which
was a really simple test web server written in Perl as part of the Kiwix tools repository.
It turned out that ZIM files were a lot less complicated than I thought, and it is basically an archive
containing XZ compressed HTML files for each article, and you use a binary search to find the exact file you want.
What impressed me most was that it was
only 300 lines of Perl code, and ran on a standard Perl implementation with no extra dependencies! Since this was so small, I
figured that it would be straightforward to understand, and worst case scenario I could rewrite it from scratch in another
language if I really needed to. Every time I'd ever touched Perl was a horrible experience, but I figured that I should be able
to deal with all its craziness for this project, what could possibly go wrong?
64-bit seek on 32-bit perl
zimHttpServer worked perfectly
at creating a web server with the 59 Gb ZIM file I had on my x86 Linux and OSX machines, so that was a good start. The code was
a bit old from 2012 and required a few cleanups and bug fixes, and I also put some extra debugging in there so I knew what was going
on. However, when I copied it over to my MT7688 device, I ran into a big problem. It turns out the MT7688 is a 32-bit processor,
and Perl doesn't support 64-bit integers or seek operations. The ZIM header and article lookups use 64-bit offsets, and I was able to change the
parser to read these offsets as two separate 32-bit values. What remained was trying to do a 64-bit seek, but 32-bit perl only
supports 32-bit operations via seek(), and there is no 64-bit llseek() support. So you need to break down the seek operations
into smaller chunks of size 0x40000000 (1 Gb), because 0x80000000 (2 Gb) is a negative number in 32-bit machines. So you do an
absolute seek to 0x0, and then relative seeks of 0x40000000, and then a final relative seek to the offset you need. It can take
50 seek() calls to get to the 50th gigabyte in a file, but this does not impose a huge impact on the overall speed of fetching an article.
The next problem was that zimHttpServer was using Perl's buffered I/O abstraction, which tries to read back the seek() location and
it fails. So I needed to convert from open() to sysopen(), which uses raw file I/O and supports passing O_LARGEFILE to the kernel. There
were a few places where the code was using fancy Perl I/O operations that needed to be rewritten to work with this as well. So after
making these changes, I was able to get everything working with the large files on the MT7688.
Extra MIPS binaries from OpenWRT
Since the MT7688 runs a simplified OpenWRT distribution, there were a few things missing. zimHttpServer relied on an external xz compression
executable to do the work, and xz was not included. So I found the OpenWRT build
and configured that to build the xz binaries. I then copied the xz binaries over to run on the device.
Indexing
zimHttpServer.pl including a primitive indexing technique, where on first run it traverses the entire ZIM file and generates a single text file
with the names of all the articles. However, this single file is hundreds of megabytes and takes a long time to search through. I modified the code
so that it generates separate index files for each letter of the alphabet, and you can only search by the first word. This helps to reduce searching
time to only a few seconds, and doesn't require any more memory.
Web server and WiFi
After making these changes, I now have a version of zimHttpServer.pl that runs on 32-bit machines with only 128 MB of RAM, and can serve the full 59 Gb ZIM file
from wikipedia, including all the images! I configured the LinkIt 7688 to start up its own local WiFi, and configured /etc/hosts so that
the name "wikipedia" maps to the local IP address of the device. On your laptop, phone, or e-reader, you can pair up to this WiFi, and use any browser
to visit http://wikipedia:8080 and it just works!
Packaging and power
The device is so tiny and I didn't want to make it too big by adding any plastic housing around it, it is only 63mm long and 26mm wide, and weighs basically nothing. So I wrapped
Kapton tape around it to insulate the electronics. I'm not using this for anything else, so the
only port available is the Micro USB port, where you plug in the power supply. The power usage is incredibly low, and peaks at only 300 mA at 5V under heavy I/O.
You can run this off an incredibly tiny battery pack, and carry it around in your jacket and broadcast Wikipedia to all! I like the packaging of this
device, where the Micro USB port is on the end, so the cables are inline. Everything was really cheap, with the LinkIt 7688 costing only US$15, and a 64 Gb
SD card costing less than US$20. This will only continue to get cheaper over time.
LinkIt 7688 wrapped in Kapton tape for protection, compared with US 25 cent coin. Micro SD card holder visible on rear side.
Source code
You can download the modified code from my GitHub repository,
which is based on the original zimHttpServer.pl code.
It was originally licensed under GPL v3 as part of the Kiwix project.
If you want to see how to created the xz binaries using OpenWRT, I have a
GitHub repository with the build, and also a fork of the
repository from MediaTek.
|