Intermittent context truncation / distillation

With the way that I understand Venice to work, the context of an inference is sent to a given GPU inference provider, which returns a response. On subsequent queries within the same “chat”, the whole context is sent to another inference provider, and so on.

If the initial prompt contains, say, a medical test results PDF which I ask the model to extract the results from, and that document contains my other personal information, then is it safe to say that the PDF contents are sent to the next inference provider for each request?

It would be useful to be able to click a button and have all context above that point hidden from future requests. Additionally, another mode for that button could be “distillation”, where the model is prompted to produce a scrubbed summary of the above context, such that subsequent requests will only include the distilled context.

The purpose of this being to limit the distribution of identifying information between difference inference providers.

Please authenticate to join the conversation.

Upvoters
Status

Backlog

Board
💡

Feature Requests

Tags

Privacy

Date

About 1 year ago

Author

Justin Martin

Subscribe to post

Get notified by email when there are changes.