Is vision a mandatory requirement of intelligence?
I don’t believe so. Because representation is relative.
Consider an artificial intelligence program that can only read textual content. It analyzes properties such as word length, vowel/consonant distribution, position of the word in the sentence, adjacent words, etc.
Such a program’s perception of the word “table” is relative to the textual context of each occurrence encountered. The word “table” is spotted not far from the word “chair”, and “dinner”. Given a significant amount of data, the program has a pretty clear idea of what a table is, in a contextual sense.
Will it matter if such a program has never seen a table? In a Turing test scenario, it could theoretically talk extensively about a table, its relation to space, what color it could be, etc., based on text analysis alone.
Human understanding of a table is relative to the light packets our eyes have received, the electric impulses our nerve endings have transmitted, and sound waves interpreted as speech, that mentioned a table. Such an understanding is well adapted to our interpretation of reality.
Theoretically, a program and a human being interacting on a text based terminal could understand each other, even if their relative inner representation of the information is different.
A machine’s environment is textual. Each day, around the world (and the clock), computers read through billions of lines of text. A well adapted program intended to demonstrate signs of intelligence should probably focus on text analysis at this point.
Although the addition of visual and audio information to an intelligent program couldn’t hurt to assist humans in their interpretation of reality (driving, etc.), I don’t believe it is a mandatory requirement to exhibit intelligent behavior, because in the end, a machine will still think of that extra layer of information as a string of text.