By Ravi Shankar
Data is the fuel of the digital economy. Data-centric organizations realize a differentiated competitive advantage. Therefore, every organization must have a data management strategy in place that enables them to effectively ingest, store, organize, and analyze the data generated by the various business functions.
However, with increased regulation, Enterprises and organisations in India need to make sure that everything they do in terms of data protection and data governance adheres to the new regulations and standards. In particular, we have seen BIS release the new 17428 standards, broken down into two parts, part 1 sets out the “Requirements” where as part 2 sets out “Guidelines”. This standard is important in India as India is a sovereign state, and as such not all of the contents of the EU GDPR actually make sense, as this was designed to work across borders.
We have also seen the Indian government place in front of parliament the PDP (Personal Data Protection) Bill currently it is in draft form but is expected to pass into law in late 2021 or early 2022. The aim of the PDP bill to put it quite simply is to provide for the protection of the privacy of individuals relating to their personal data.
Organisation’s data management strategy’s goal should make sure that the data in the digital systems is accurate, protected and accessible to authorized consumers. However, creating a future-proof data management strategy is an incredibly complex task, given the rapid advancements in emerging technologies such as cloud and big data systems and the fast-changing business requirements that require real-time data instead of yesterday’s data.
To adhere to the new standards and proposed legislation, inevitably some data needs to be anonymized, and ‘the right to be forgotten’ needs to be implemented. More business users and regulators require that the entire ‘factory’ that delivers them data becomes more transparent, which implies more up-to-date data catalogues and metadata. Let us understand in detail some of the critical challenges that need to be considered for creating a holistic data management strategy and a data architecture that complements it.
Real-time data access – Organizations need access to real-time data to adapt quickly to market changes and support real time analytics use cases such monitoring consumer behaviour, ad optimization, product recommendations and more. This means that data must be analyzed by the users right after it has been produced. However, the data architecture in most organizations is not designed to support real-time analytics. The most common approach to business intelligence and analytics, taken by most organizations involves replicating data from source systems to intermediate storage solutions like data warehouses and data lakes using several ETL processes. While this approach is suitable for regular business reporting, it doesn’t support real time analytics use cases. Therefore, organizations must adopt an alternative approach that supports both traditional forms of business reporting and advanced analytics such as real time and streaming analytics.
Big Data – In order to carry out advanced analytics organizations need to store and analyze a variety of big data. This variety of big data includes but is not limited to texts (e.g. contracts and social media messages), voice messages (e.g. conversation between air controllers and pilots), images (e.g. car damages due to accidents), and videos (e.g. from security cameras and cameras at airports and retail stores). Organizations also like to store data resulting from monitoring of new business programs resulting in the huge volume of data. There is also the case of streaming data that needs to be pushed from source to real time streaming applications. Data from wearable devices, in-game player activity, telemetry from connected devices fall under this category. Irrespective of the type of analytics the company wants to carry out the huge volume and variety of big data will directly impact the technology in the data architecture.
Cloud Platform Interoperability – Cloud computing technology is growing more quickly than ever before. Applications are becoming more portable, enabling compute cycles to support workloads in real-time, and data integration platforms are streamlining connectivity and crossing platform boundaries, making hybrid and multi-cloud architecture the de-facto standard. Therefore a new data architecture strategy should support cloud platform interoperability. This would also make it possible to carry out reporting and analysis for business cases that require pulling data from multiple cloud platforms.
Data science – Data science enables organizations to find hidden patterns in data by creating analytical models. These analytical models are created using techniques such as statistics, deep learning, machine learning and AI. However, several studies have shown that data scientists often spend 80% of the time on data preparation tasks such as data cleansing and data exploration and only 20% of the time on creating predictive models. A modern data architecture plan therefore should contain proper tools that allows data scientists to focus on their core skills.
Given the pace at which the world is changing, businesses need an agile data management strategy to match it. Therefore, the need for the hour is a logical architecture that is flexible enough to include any type of new sources with minimal reconfiguration and serve a multitude of users and consuming applications.
(The author is senior vice president and chief marketing officer, Denodo. Views are personal.)