Photo by Rawpixel

Unforeseen consequences: A look into data storage and usage

We are in the midst of the information revolution. Where oil or coal were once used to propel advancement, today data is the leading resource. However, unlike oil and coal, data is unlimited. It can be reused an infinite amount of times and be possessed by an infinite amount of people. We have no way of knowing what data will be used for in the future or how it will be linked with other data. We have no way of knowing which data should be publicly accessible and which should not. Ergo, few restrictions exist on what data can be gathered and what it can be used for. Unimpeded, companies around the world gather and store millions of people’s data.

According to a 2017 study conducted by Ghostery, a database on web trackers, approximately 79 percent of websites track users’ data. Through the use of cookies (small pieces of data used to identify a computer network) and other trackers, websites gather information about a user’s behavior and interests. User data can look like anything, whether it be Wordle tracking an IP address to keep track of a streak or Netflix calling out the 53 people who watched a Christmas movie 18 days in a row. This data is used to determine website performance and ensure ad reach. However, some websites store this data and send it to other companies. Oftentimes, this information is used to create personalized advertisements. For example, if someone has been looking at baby clothes, they’re likely to get ads about diapers. This practice is fairly common, with 10 percent of websites with trackers sending user data to 10 other companies or more. Five years and a pandemic turning a significant portion of daily life digital later, that percentage has only grown.

Unfortunately, data tracking doesn’t stop at websites. “Smart” devices track user information daily as well. For example, Roomba maps the homes of its users to maximize efficiency. Robot vacuums have sensors that enable them to memorize the location of every object in a home. It tracks where the walls are, where furniture is, and what areas require more cleaning, among other things. While this data might simply seem as if it improves Roomba’s function, it can improve other companies’ products as well. Knowing more home floor plans and layouts could help speaker companies improve audio performance using a room’s acoustics. Similarly, knowing the spatial mapping of a room could help improve air conditioners’ airflow and increase cooling. The knowledge of home layout can even be used to (again) improve advertising, recommending a user furniture ads when it tracks an empty room in their home.

The primary problem with this data collection is that most people don’t even know what information is being collected. While terms and conditions are easily accessible for apps, most people do not take the time to read what they are agreeing to. However, this behavior becomes quite dangerous when purchasing products because, many times, the terms and conditions are implied to be accepted when the object is bought. As such, people are completely unaware that their in-home smart devices could be keeping greater tabs on them than the device’s functionality implies it does. In the worst-case scenario, this means people are completely unaware of what information about them can be published or stolen during company data breaches. With personal data exposed, people become more vulnerable to new and improved phishing campaigns built off of data that they were unknowingly exposing to the world.

Currently, few protections exist for consumers when it comes to data gathering. The only U.S. law that protects against what companies track, store, and sell about consumers is the California Consumer Privacy Act (CCPA). The law mandates that any company doing business in California must post their digital privacy notices, honor Do Not Track privacy settings, clearly explain what information will be collected and how it is used, and allow consumers to opt out of personal information sales. CCPA protects citizens of California but does not apply to the rest of the United States (or even the world for that matter).

A lack of regulation means companies are responsible for moral practice. Unfortunately, many will not do so without pressure. As such, increasing public knowledge about personal data and its usage is imperative to keeping consumers safe. Knowing what information can be gathered about you can encourage safe internet habits and ensure that you know what information about you is made available.

Big Data (2018). DOI: 10.1089/big.2018.0083