Findimagedupes is a command-line utility which performs a rough "visual diff" on two images or a whole tree to detect if any look similar. It can produce output suitable for driving GUI front-ends. It can also export a GQView compatible collection file, so you can deal with the duplicates visually. On common image types, findimagedupes seems to be around 98% accurate.
Markus Schoder has contributed finddupes.cpp, GPL'ed source code for a C++ based version of my horribly slow compare routine. In his testing on a directory of 35,000 images, it was about 300 times faster than findimagedupes' perl implementation. It's included here for everyone who has experienced the speed problem. I'll probably integrate it into the next release somehow.
You can compile this by going
g++ -O3 finddupes.cpp -o finddupes
(or download this gzipped executable, built on Mandrake 7.2) and run it like so:
finddupes .95 <imagedupes-db.txt
Version 0.1.3 released with fixes and performance enhancements from Paul Cassella and Max Stekelenburg, as well as bugfixes to make it work with Linux-Mandrake 7.2 and a new "GUI mode" (not an actual GUI, but it produces output which ought to be of easier use to a GUI.)
[2000/10/01 15:30] Performs a rough "visual diff" on two or more images.
This command line program will scan two pictures (or a whole tree of pictures) and determine if there are any that look alike. It uses a simple algorithm, hopefully documented well in the code, to reduce every picture to a 16x16x1 bitmap, and counts the bits that differ between each pair. It's something like 98% accurate when used on typical image subjects. Text or other graffiti added to pictures will usually not confuse the program, but if you take a lot of very similar pictures (like sunsets or webcam grabs) they will probably turn up as false positives.
Download findimagedupes 0.1.3.
Download findimagedupes 0.1.2.
Download updated Debian Sid package (0.1.3-1) kindly contributed by Guenter Bechly.
Usage:
Requirements:
Do you consider this article interesting? Share it on your network of Twitter contacts, on your Facebook wall or simply press "+1" to suggest this result in searches in Google, Linkedin, Instagram or Pinterest. Spreading content that you find relevant helps this blog to grow. Thank you!
Markus Schoder has contributed finddupes.cpp, GPL'ed source code for a C++ based version of my horribly slow compare routine. In his testing on a directory of 35,000 images, it was about 300 times faster than findimagedupes' perl implementation. It's included here for everyone who has experienced the speed problem. I'll probably integrate it into the next release somehow.
You can compile this by going
g++ -O3 finddupes.cpp -o finddupes
(or download this gzipped executable, built on Mandrake 7.2) and run it like so:
finddupes .95 <imagedupes-db.txt
Version 0.1.3 released with fixes and performance enhancements from Paul Cassella and Max Stekelenburg, as well as bugfixes to make it work with Linux-Mandrake 7.2 and a new "GUI mode" (not an actual GUI, but it produces output which ought to be of easier use to a GUI.)
[2000/10/01 15:30] Performs a rough "visual diff" on two or more images.
This command line program will scan two pictures (or a whole tree of pictures) and determine if there are any that look alike. It uses a simple algorithm, hopefully documented well in the code, to reduce every picture to a 16x16x1 bitmap, and counts the bits that differ between each pair. It's something like 98% accurate when used on typical image subjects. Text or other graffiti added to pictures will usually not confuse the program, but if you take a lot of very similar pictures (like sunsets or webcam grabs) they will probably turn up as false positives.
Download findimagedupes 0.1.3.
Download findimagedupes 0.1.2.
Download updated Debian Sid package (0.1.3-1) kindly contributed by Guenter Bechly.
Usage:
findimagedupes [options] [<file1> <file2>] Options: -rescan = rescan fingerprints of all files in directory -f <file> = use <file> as image fingerprint database -d <dir> = scan <dir> instead of current directory -t <num> = use <num> as threshold% of similarity (default 90) -v <program> = launch <program> (in bg) to view each set of dupes -c <file> = create GQView collection <file>.gqv of duplicates <file1> <file2> = diff just those two files, using -v if present (other options ignored if files are specified) -p = only valid when files specified; prints the hex of the actual fingerprint of each file. -g = GUI mode: produce only machine-friendly output.
Requirements:
- perl - as with everything on this page
- ImageMagick - library for manipulating images
- PerlMagick (Image::Magick) - Perl interface to above
- pwd, find, sort, tput (curses), file (i.e. if this works right under NT I'd be surprised)
- A bunch of pictures of which you've totally lost control
- (optional) GQView- to manage collections of duplicate images visually
Custom Search
If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog:
0 comments:
Post a Comment