dev/lang/parser/ SimpleTokeniserInPython2


The aim here is to emulate what happens with the bash. If we have e.g.

the "quick brown" fox' jumps over"'" the" lazy dog.

this should become

["the","quick brown",'fox jumps over" the', "lazy", "dog."]

We use a simple state machine with four states: whitespace, nonwhitespace, singlequote and doublequote. The logic:

From state 'whitespace'
If we are in 'whitespace' and we see a single quote, consume the quote and move to state 'singlequote'.
If we are in 'whitespace' and we see a double quote, consume the quote and move to state 'doublequote'.
If we are in 'whitespace' and we see non-whitespace, start a new token, add the character to it, move to state 'nonwhitespace'.
If we are in 'whitespace' and we see whitespace, consume the whitespace and remain in state 'whitespace'.
...

Source: