Readers like you help support AWG. If you buy through our links, we may get a commission at no cost to you. Read our affiliate policy.
What do the AI chatbots know about us and who do they share it with?
How is the privacy of all of us guaranteed now that AI bots like ChatGPT and Bard are being enhanced with huge amounts of data from the internet?
The latest generations of chatbots, with OpenAI’s ChatGPT and Google’s Bard at the forefront, can do far more than previous versions, and that’s not always something positive.
The latest explosion in AI development has already raised concerns about misinformation, fake news, plagiarism, and machine-generated malware. But what problems can generative AI create for the integrity of the ordinary user? Engadget has looked into this.
The answer, according to the experts Engadget spoke to, is largely about how the bots are trained and how much we intend to interact with them.
User privacy depends on how the chatbot is trained
In order to mimic human interaction, AI bots are trained with large amounts of data, with a large part of it coming from repositories such as Common Crawl. As the name suggests, Common Crawl has been collecting data for years, so it now consists of petabytes of data scraped together by vacuuming the open web.
Although ChatGPT and Bard use what they call a “filtered” portion of the data from the Common Crawl, just the size of the model makes it impossible for anyone to examine and validate all that knowledge and information.
Either because you have acted carelessly yourself or because of poor security policies of a third party, data may be available in some sinister place on the Internet. Although it can be difficult to get hold of for the ordinary user, it is not impossible that the information has been harvested and put into a training program to then be revealed by the chatbot.
Private Information May Be Exposed by ChatGPT and Bard
That a chatbot can reveal a person’s contact information is unfortunately not a theoretical concern.
Bloomberg writer Dave Lee wrote in a post on Twitter that ChatGPT provided his phone number when asked to chat on the encrypted chat platform Signal.
This kind of interaction is probably an extreme case, but it’s still worth thinking about what information the learning models have access to.
“OpenAI is unlikely to collect specific information like health data and attribute it to individuals to train its models,” SANS Institute security fellow David Hoelzer told Engadget. “But could it inadvertently be in there? Absolutely.”, he assesses.
ChatGPT: I Am Programmed to Follow Ethical and Legal Standards
OpenAI, the company behind ChatGPT, did not respond to Engadget’s request when asked to explain the measures it employs to protect data integrity or how it handles personally identifiable information that may have ended up in training kits.
So instead, they asked ChatGPT themselves. It said it is “programmed to follow ethical and legal standards that protect user privacy and personal information” and that it “does not have access to personal information unless it is given to me.”
Google has told Engadget that it has programmed similar blocks into the Bard to avoid sharing personally identifiable information in conversations. This is probably something to take with a grain of salt, as this type of AI bots can be programmed to illuminate this.
Practically, ChatGPT itself touched on the other area where generative AI can be a threat to integrity: the use of the software itself – either through information shared directly in chat logs or device and user information intercepted by the service during use.
Warns that Conversations Are Being Reviewed by Humans
Although it has a “delete conversations” option, it doesn’t actually delete the user’s data. You can read this on OpenAI’s FAQ page. The Company also cannot delete specific orders. They discourage users from sharing sensitive information, but the only way to remove personally identifiable information you’ve shared with ChatGPT is to delete your entire account.
If you do this, OpenAI promises to permanently remove all associated data.
ChatGPT went offline for a short time in March due to a programming bug that revealed information about users’ chat history. At this earlier stage of the service’s development, it remains to be seen whether chat logs from this kind of AI turn out to be vulnerable targets for cybercriminals.