Graphic to Text Interpretation
Page 1 of 1

Author:  scsu_13 [ Thu Apr 19, 2012 6:52 pm ]
Post subject:  Graphic to Text Interpretation

Hello Everyone.
Being that this is my first post I am not sure if anyone has posted or discussed this topic, but here it goes.

I am trying to create a scanner out of the the light sensor and I am able to place an image on the lcd screen by using the nxtsetpixel command. I want the robot to recognize that a number has been scanned and maybe be able to beep the numeric value or understand that it has scanned a 3. By doing this the program will be able to scan numbers and solve problems like adding, subtracting, or maybe a sudoku puzzle.

If anyone can help it will be greatly appreciated.

Author:  magicode [ Thu Apr 19, 2012 8:59 pm ]
Post subject:  Re: Graphic to Text Interpretation

Edit: Looks like the sensor is much more accurate than I thought. Someone already seems to have done what you want.

Author:  DiMastero [ Thu Apr 19, 2012 9:13 pm ]
Post subject:  Re: Graphic to Text Interpretation

Magicode: I think he wants to read it from a piece of paper or something, and then display it on the NXT screen, like in tiltedtwister's Sudoku solver:

EDIT: I think the idea is that, once it's on the screen, he'll be able to use the number in a program (so have it in a variable, not just "random" pixels on the screen -- the NXT needs to read the individual pixels, place them on the screen, and then figure out what the number is, I think)

Scsu_13: If that's the case, I think it's certainly possible to do what you want. Some things to consider though are if the numbers will always be the same size (recognizing numbers of different scales will be much more difficult), and how you want to go about moving the light sensor over the number to "read" it -- a car like in the video or something else?

Author:  miki [ Fri Apr 20, 2012 2:55 pm ]
Post subject:  Re: Graphic to Text Interpretation

scsu_13 wrote:
I want the robot to recognize that a number
Hi scsu_13!
There are a lot of documentation about OCR (optical character recognition) on the web. However, most of them are all about maths and maybe a bit harder for an average level (like mine ;-) )

According the reduced number of glyphs you have to decode (9), their location (in cell center), their orientation (horizontally aligned), a simpler approach may interest you: Bayesian theorem (or fuzzy logic)

The principle is to obtain a (enough high) probability that your glyph is (let's say) rather a '5' than a '9'.
The Bayesian theorem can tell how probable is an event in a context, according previous statistics done on this event in same context.
It is used for instance in photography where camera should recognize the type of photography (portrait or landscape or macro). Given statistics (metrics on focal, light distribution, contrast, etc) of hundred of landscape portrait and macro pictures, the camera feeds the Bayesian algorithm with current metrics and obtain the most probable type of current photo. The beauty of this algorithm is how silly it is (a 'landscape' means nothing for it) and how accurate are its predictions.

The same algorithm is used in anti spam software. They too do not understand the email content but can predict (and learn from user) what is a spam.

I hope this can inspire you:
You may get some metrics on values scanned for a suduku cell (let's say array of 12x12 spots with values from 0 to 255), like average value, symmetry, gravity center, etc, and use the results to find the most probable value.
ie: if there is an vertical symmetry, its probably '0' or '8'.
or if most of dots are on the right, its probably a 3 or 9

Anyway you have a very interesting challenge :-)
Keep us inform of your solution (and problems ;-) ).

Best regards

Page 1 of 1 All times are UTC - 5 hours [ DST ]
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group