*But what we seemingly lose in value here, we gain back by introducing the “hidden state” that links one input to the next.*

And in cases like speech recognition, waiting till an entire sentence is spoken might make for a less compelling use case.

Whereas for NLP tasks, where the inputs tend to be available, we can likely consider entire sentences all at once.

In summary, in a vanilla neural network, a fixed size input vector is transformed into a fixed size output vector.

Such a network becomes “recurrent” when you repeatedly apply the transformations to a series of given input and produce a series of output vectors.

There is no pre-set limitation to the size of the vector.

And, in addition to generating the output which is a function of the input and hidden state, we update the hidden sate itself based on the input and use it in processing the next input.

While RNNs learn similarly while training, in addition, they remember things learnt from prior input(s) while generating output(s). RNNs can take one or more input vectors and produce one or more output vectors and the output(s) are influenced not just by weights applied on inputs like a regular NN, but also by a “hidden” state vector representing the context based on prior input(s)/output(s).

So, the same input could produce a different output depending on previous inputs in the series.

I am sure you are quick to point out that we are kinda comparing apples and oranges here.

The first figure deals with “a” single input whereas the second figure represents multiple inputs from a series.

