dev/lang/parser/ SimpleTokeniserInPython2
The aim here is to emulate what happens with the bash. If we have e.g.
the "quick brown" fox' jumps over"'" the" lazy dog.
this should become
["the","quick brown",'fox jumps over" the', "lazy", "dog."]
We use a simple state machine with four states: whitespace
, nonwhitespace
, singlequote
and doublequote
. The logic:
From state 'whitespace'
If we are in 'whitespace' and we see a single quote, consume the quote and move to state 'singlequote'.
If we are in 'whitespace' and we see a double quote, consume the quote and move to state 'doublequote'.
If we are in 'whitespace' and we see non-whitespace, start a new token, add the character to it, move to state 'nonwhitespace'.
If we are in 'whitespace' and we see whitespace, consume the whitespace and remain in state 'whitespace'.
...
Source: