I begin with an explanation of what's meant by 'big data' and the technological challenges it poses. Challenges which are starting to be addressed by Apache's Hadoop framework and startups like Cloudera and Hortonworks. Then I discuss why there's so much interest in capturing our personal information and the worrying concerns over privacy, before considering the likely effect on personal computing.
Big data is a hot topic this year. There's been a surge in cloud storage offerings from the likes of Apple, Microsoft, Amazon and now Google. Plus plenty of media coverage on the personal privacy issues surrounding the UK Government's push for GCHQ access to social networking and website data.
Here’s a couple of extracts from the article:
Acquiring masses of data is pointless unless its hidden secrets can be revealed. This means uncovering the patterns and trends within masses of unstructured data, and creating human-understandable reports and charts. Before this happens the data needs to be extracted, filtered, manipulated and subjected to deep statistical analysis.
Existing technology isn't well suited to big data challenges. Spreading the processing load efficiently can be a hard problem to solve. Relational databases, such as those from Oracle and MySQL, store, manage and retrieve information in carefully designed tables. However, table-based storage isn't suited to the dynamic, unstructured nature of big data information. And SQL is too inflexible to cope with big data analysis.
What's needed is a new breed of data storage and transaction processing technology. Oracle and other relational database organisations are already working hard to offer big data solutions that supplement their relational offerings. But Apache has rather stolen the initial limelight with their open source Hadoop project. Even Microsoft has Hadoop at the centre of its big data strategy.
Existing technology isn't well suited to big data challenges. Spreading the processing load efficiently can be a hard problem to solve. Relational databases, such as those from Oracle and MySQL, store, manage and retrieve information in carefully designed tables. However, table-based storage isn't suited to the dynamic, unstructured nature of big data information. And SQL is too inflexible to cope with big data analysis.
What's needed is a new breed of data storage and transaction processing technology. Oracle and other relational database organisations are already working hard to offer big data solutions that supplement their relational offerings. But Apache has rather stolen the initial limelight with their open source Hadoop project. Even Microsoft has Hadoop at the centre of its big data strategy.
However, there is another and somewhat darker side to big data. With much of the focus being on acquiring and storing social information there are obvious concerns about the protection of our right to privacy and anonymity.
We are already very close to a situation where every message you send, every call you make, every website you visit and every item you buy, is logged and stored in some anonymous far-flung server. These logs reveal patterns of our daily lives, including electronic communications, geographical movements, social interactions, personal preferences and regular habits.
And these logs contain other secrets. Think you have deleted some emails? The log will have recorded any previous email contact history. Removed some website images from your website or Facebook? They could well still exist in a backup on some Internet-connected server.
We are already very close to a situation where every message you send, every call you make, every website you visit and every item you buy, is logged and stored in some anonymous far-flung server. These logs reveal patterns of our daily lives, including electronic communications, geographical movements, social interactions, personal preferences and regular habits.
And these logs contain other secrets. Think you have deleted some emails? The log will have recorded any previous email contact history. Removed some website images from your website or Facebook? They could well still exist in a backup on some Internet-connected server.
Read more analysis posts.
No comments:
Post a Comment