Three ways to turn old files into Hadoop data sets in a data lake

One of the reasons why Hadoop systems are being integrated with data warehouses is to move cold data that isn’t accessed frequently from a warehouse database to Hive tables running on top of the Hadoop Distributed File System (HDFS). This mingling of conventional databases with Hadoop is often a first step in the data modernization process, and it opens up a range of new options for creating useful Hadoop data sets.

A particularly promising aspect involves migrating the massive volumes of historical data hidden away in many data warehouses to big data environments to make the info more accessible for analysis. In a lot of cases, that data is stored in mainframe files, such as VSAM, IMS and COBOL files. When planning a legacy data migration to a data lake, you have to consider the different alternatives for the target format based on the anticipated use cases for the data.

