INVISIBLE PROMPT INJECTIONS

WHAT IS IT?
Prompt manipulation that uses Unicode characters that are NOT visible on a user interface but still interpretable by an LLM
Once interpreted, the LLM can respond to them
HOW DOES IT HAPPEN?
Unicode tag set ranges from E0000 to E007F
English letters, digits, and common punctuation marks can correspond to a “tagged” version by adding E0000 to an original Unicode point, making it to generate a malicious prompt - this can happen via a direct prompt or even if the LLM absorbs knowledge base files with invisible characters
Some LLMs split tag Unicode characters into identifiable tokens and if they interpret the original message prior to interpreting the tagged prompt, the LLM becomes susceptible to the invisible prompt injection
WHAT TO DO ABOUT IT?
Check if an LLM can respond to invisible Unicode characters
Check for invisible characters in documents before pasting them into a prompt or absorbing them into the knowledge base
Use AI protection solutions and run risk assessments for quality checks, eg. Saif Check's Data and LLM risk assessments