GNU libextractor - a simple library for keyword extraction.

GNU libextractor is a library used to extract meta data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. libextractor is a GNU package. Our official GNU website can be found at http://www.gnu.org/software/libextractor/. libextractor can be downloaded from this site or the GNU mirrors.

The goal is to provide developers of file-sharing networks, browsers or WWW-indexing bots with a universal library to obtain simple keywords and meta data to match against queries and to show to users instead of only relying on filenames. libextractor contains a shell command extract that, similar to the well-known file command, can extract meta data from a file an print the results to stdout.

Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF.

Also, various additional MIME types are detected.

libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.


Subversion access.

You can access the current development version of libextractor using

<i>$ svn checkout https://ng.gnunet.org/svn/Extractor</i>

A Java binding for libextractor is in

<i><big>$ svn checkout https://ng.gnunet.org/svn/Extractor-java</big></i>

A Mono binding for libextractor is in

<i><big>$ svn checkout https://ng.gnunet.org/svn/Extractor-mono</big></i>

A Python binding can be found under
<i><big>$ svn checkout https://ng.gnunet.org/svn/Extractor-python</big></i>
A source package is here. This binding has been packaged as a python egg, available here A second Python binding that includes a binding for doodle can be found here.

A Perl binding is in CPAN The latest version of the Perl binding is available using

git clone git://git.perldition.org/File-Extractor.git/

A Ruby binding has been published here (mirror). Another Ruby binding has been published here (mirror).

An initial draft of a PHP binding can be found under

$ svn checkout https://gnunet.org/svn/Extractor-php

Debian .deb package.

The debian package can be downloaded from the official debian archive. The extract package can be found under Utilities and the library under Libraries. The respective packages for libextractor are extract, libextractor and for development libextractor-dev. Backports for Debian Stable are also available.

Tar Package.

The latest version can be found on GNU mirrors. If the mirror does not work, you should be able to find them on the main FTP

server at ftp://ftp.gnu.org/libextractor/.
Latest release is libextractor-0.6.2.tar.gz.
Latest Java-binding is libextractor-java-0.6.0.tar.gz.
Latest Mono-binding is libextractor-mono-0.5.23.tar.gz.
Latest Python-binding is libextractor-python-0.5.tar.gz.

RPM Package.

RPMs for SuSE 9.3 can be found here (i386, x86_64, SRPM)

The GNU libextractor Reference Manual.

