If you’re reading this, you likely own, manage or work for a company that handles sensitive data. Chances are there are people in your company finding completely valid uses for cloud-based Large Language Models like Bard, Gemini, Claude and ChatGPT.
Most of us have heard the spiel about protecting our information when using these services, but you probably haven’t considered the scale of information that is gathered automatically.
What’s Really Being Collected
If your company is using a tool like ChatGPT, be aware that the following info is likely recorded, (depending on the service) and may be tokenized for the purpose of training future models:
- Everything you enter into the chat
- All Responses
- Geolocation
- Public IP
- Contact Information
- Account Information
- Device and Browser Cookies
Now don’t get me wrong; there are ways around this. You just need to be informed.
Privacy Settings, the Little-Known Option
Let’s take OpenAI and ChatGPT, for example. With over 200 million weekly users, it’s probably a safe bet that people in your company are using them.
In response to the mountain of people complaining about data security, OpenAI has implemented the ability to opt out of data harvesting on their free and premium tiers. Not only that, but their new Enterprise subscription tier covers this setting by default. These are huge moves by the company that shows they are taking this problem seriously, especially when included as a default feature.
Though completely understandable, it’s unfortunate that so many people choose not to utilize this technology due to privacy concerns when there are more secure options available. It’s important to explore all the avenues and make sure you have the proper information before making the decision to completely rule it out – especially considering the benefits of AI.
Don’t take it from me, read how you can opt out of data harvesting from the two companies that actually offer the option:
What They Do and Don’t Do with Your Data
The last thing I want to cover before we move on is a misunderstanding about what happens to data that is collected. OpenAI isn’t selling your secrets. They aren’t using your data for advertising and they’re not purposefully training their models to try to sell you products.
The real threat to your company comes when your data is included as training material, because it may show up in GPT’s responses when future models release. This goes for all models, not just OpenAI’s. This type of leak can cause competitive disadvantages, product disclosures and even compliance violations that put your business at risk.
Real Events Highlight AI Risks
Let’s cover a real-world example. In 2023, it was widely reported that are significant leak involving Samsung Superconductor occurred when employees (rightly knowing the benefits of using GPT) wrongly uploaded source code to utilize the LLM for bug fixing and used their phone and the GPT app to record meeting notes to create an automated presentation on that info. That protected data was stored and likely utilized by OpenAI to train their next model.
Once a new model comes out and users begin to test out new prompts, there’s a chance that information like this may appear in GPT’s responses. Other users developing similar products may be fed this information, and Samsung would suffer from the leak. Of course, if a model has the information, it will provide it! This heavily simplifies the situation, but it happened, it worked, and it could happen to your company too.
Since then, Samsung has banned any and all internal use of “generative AI”. Not just ChatGPT, but all models and modalities. What some do not report, however, is that they understand the massive benefits that generative AI brings to the table. So much so, they are currently in development of their own model that will function on their internal network without sharing any data with a third party.
What Should Your Company Do
Examples like the one we just covered should be enough for your company to take this information seriously. Take a few minutes and review the links above to understand what the companies are doing with your data, and what the privacy settings cover. Figure out the best way for your users to tackle privacy concerns and opt out of the data collection when using these services for anything even remotely related to work. If your company is adopting AI in a major way, think about using an offline model that guarantees that your data is secure. On top of that, do legitimate training with your staff on the proper handling of sensitive information in general!
We’re calling it now. Leaks like this will revolutionize future phishing attacks, but we’ll have to dive into that in a future article.
How CCB Can Help
Whether you’re looking to streamline your IT operations, embark on new IT projects, or need assistance with procurement, our team is ready to elevate your business with cutting-edge solutions. Don’t let the potential of AI and digital transformation pass you by.
Contact us today to discover how we can help your business thrive securely in the era of AI.