Markus Schoder has contributed finddupes.cpp, GPL'ed source code for a C++ based version of my horribly slow compare routine. In his testing on a directory of 35,000 images, it was about 300 times faster than findimagedupes' perl implementation. It's included here for everyone who has experienced the speed problem. I'll probably integrate it into the next release somehow.
You can compile this by going
g++ -O3 finddupes.cpp -o finddupes
(or download this gzipped executable, built on Mandrake 7.2) and run it like so:
finddupes .95 <imagedupes-db.txt
Version 0.1.3 released with fixes and performance enhancements from Paul Cassella and Max Stekelenburg, as well as bugfixes to make it work with Linux-Mandrake 7.2 and a new "GUI mode" (not an actual GUI, but it produces output which ought to be of easier use to a GUI.)
[2000/10/01 15:30] Performs a rough "visual diff" on two or more images.
This command line program will scan two pictures (or a whole tree of pictures) and determine if there are any that look alike. It uses a simple algorithm, hopefully documented well in the code, to reduce every picture to a 16x16x1 bitmap, and counts the bits that differ between each pair. It's something like 98% accurate when used on typical image subjects. Text or other graffiti added to pictures will usually not confuse the program, but if you take a lot of very similar pictures (like sunsets or webcam grabs) they will probably turn up as false positives.
Download findimagedupes 0.1.3.
Download findimagedupes 0.1.2.
Download updated Debian Sid package (0.1.3-1) kindly contributed by Guenter Bechly.
findimagedupes [options] [<file1> <file2>] Options: -rescan = rescan fingerprints of all files in directory -f <file> = use <file> as image fingerprint database -d <dir> = scan <dir> instead of current directory -t <num> = use <num> as threshold% of similarity (default 90) -v <program> = launch <program> (in bg) to view each set of dupes -c <file> = create GQView collection <file>.gqv of duplicates <file1> <file2> = diff just those two files, using -v if present (other options ignored if files are specified) -p = only valid when files specified; prints the hex of the actual fingerprint of each file. -g = GUI mode: produce only machine-friendly output.
- perl - as with everything on this page
- ImageMagick - library for manipulating images
- PerlMagick (Image::Magick) - Perl interface to above
- pwd, find, sort, tput (curses), file (i.e. if this works right under NT I'd be surprised)
- A bunch of pictures of which you've totally lost control
- (optional) GQView- to manage collections of duplicate images visually