Searching for images by color
I’m trying to search for images based on color. In theory this is somewhat simple: When images are uploaded to the service, I create a histogram of rgb values from the image. When the user then commences a search with a certain colour I go though all histograms, look up the value for r, g and b and look to see if it is greater than the threshold.
Now, how the heck do I translate this into a MySQL structure? I certainly don’t want to have an own column for all values of r,g, and b (768 columns, huh), but a BLOB is not (quickly) searchable. Has anyone tried anything like this (storing somewhat large amounts of data, that is still searchable in MySQL) before? Any help would be greatly appreciated.






If you’re using MyISAM engine, you could use something like a comma-delimited text fields (in MyISAM the text fields are indexable) and use FULLTEXT search for them. Maintaining them could be a pain though.
If you’re using InnoDB, you probably have to use several varchar-columns to do the same thing.
I’m not sure how fast these solutions really are, since you will end up with pretty large amounts of data. Especially with if you have several hundred images. But maybe worth trying.
Thanks, using fulltext came to my mind, but that solution would really blow up the amount of data in the Table. An since the application relies on really quick data searches I’m keeping the table in memory instead of the filesystem. But I think I’ll have to give it a try anyhow and see how it performs.
One way to reduce the amount of data to store is to calculate average colours of, say 10×10 px blocks, and then store only the averages.
I’m not sure how accurate values the user needs but the general colour scheme could be evaluated using larger pixel blocks instead of individual pixels.
Had to do something similar in a project recently and did exactly what Unkulunkulu mentioned – broke the images into smaller blocks and calculated average colors for them. Dunno if that is accurate enough for your needs but worked for me.
Back in the days when I was working for MediaTeam I was lightly involved in CBIR (content based information retrieval) projects. There are lots of good papers on CBIR technologies online, the downside is that they get very technical and theoretical very quickly. The guys at MediaTeam and my old boss Timo might be able to give you some good pointers…
http://www.mediateam.oulu.fi/research/ivp/?lang=en
Thanks for the input.
How about if you simply discard most of the data? You would simply pick out the most significant peak(s), store it’s (their) hue & saturation value(s) into the database table.
Later on it should be reasonable easy to match images with colors. Instead of calculating the correlation between the full color spectrum requested by the user with that of each image, you calculate the (weighed sum of) difference(s) in hue peak(s) and the color user asked for. In the most simple case with a single peak hue, you can probably do the matching with a single query.
I believe something along those lines would provide good enough results with insignificant amount of maths & storage.