- Toshendra Kumar Sharma
- January 11, 2019
The advent of the internet and its subsequent popularization in the 1990s led to the democratization of information. Blockchain has been called the most important technological breakthrough since the internet and has been crediting with facilitating the democratization of data. Blockchain has enabled numerous new fields of inquiry because of the unique architecture that allows humans to establish trust over the internet. With blockchain, people are completely in control of their digital assets and data, and this can create entire marketplaces of data. Here’s a look at some of the key ways in which blockchain will impact big data in the coming years.
Monopolisation of Data
As the internet’s popularity has grown over the last 20 years, the scale of the premier internet companies like Facebook, Google and Microsoft have also grown tremendously. These companies have access to a large amount of user data such as browsing patterns, birthdays, geolocation data and pictures of their users. This data is very valuable for advertisers and retailers because it helps them market their products more effectively. As it stands now, users get no say in how their private information is sold and used. Internet companies get immense value out of selling their users’ personal data because this data can be mined to glean valuable information. Even the value of these social networks comes largely from the users that share posts and pictures on their platforms. Therefore, blockchain can help us create Web 3.0 where everyone gets to be in charge of their personal information and use it however they wish to.
How Can Blockchain Help?
Blockchain is universal shared ledgers that cannot be tampered with. They provide a way to identify users over the internet using cryptographically secure private keys that cannot be forged. This allows people’s personal data to be linked to their private key and for them to directly be consulted anytime that information is requested. Big Data refers to data sets that are too large and complex for traditional software programs to be processed. Machine learning and data mining are two processes that can decompose these large data sets to learn underlying information. Big Data can be used to learn about browsing patterns, design language processing and even help train self-driving cars. The models that are used to glean this information from Big Data sets rely heavily on the quality and authenticity of the data received. This is where blockchain can greatly benefit Big Data processing:
- Data Validation – A lot of resources are spent on manually validating data when its acquired from a third party. Blockchain can greatly reduce this overhead and reduce fraudulent data by automating data validation using smart contract technology. For instance, Lenovo has begun using blockchain to validate physical documents by encoding a digital signature in physical documents.
- Data Storage – Decentralized file storage is one of the most exciting uses of blockchain by far. By utilizing the unused storage space in people’s devices across the world, projects like FileCoin and Sia are looking to disrupt the cloud storage industry. As Big Data processing evolves, ever increasing amounts of storage are going to be required. Leading cloud storage services like Amazon S3 charge up to 10 times more compared to Sia. In fact, 1TB of files stored on the Sia network cost about $2 per month, compared to a whopping $23 on Amazon S3!
- Data Privacy & Security – Centralized data storage is decidedly insecure as is evident from the numerous reports of data breaches from top companies like Equifax and Facebook. A centralized form of data storage suffer from a single source of failure and are susceptible to disgruntled employees and malicious hackers on the internet. Blockchain services like Civic allow data to be stored on the blockchain and for users to approve every request for their data through a mobile app. Data privacy using Civic combined with services like IOTA’s big data marketplace would allow this data to be traded with the complete knowledge of the user and make it available directly to the companies who want to examine it.