Extracting Data from SAP DSO in Azure Data Factory
--
Extracting Data from SAP DSO in Azure Data Factory
There are many types of data stores available in SAP. DSO is one such data stores and has a unique feature to maintain data. DataStore Object (DSO) is used to store data at the most granular form, i.e. at document level. This data can be used for reporting as well as for passing on to other data targets like InfoCubes.
Managing data using DSO involve three components which are mentioned below. Key advantage of DSO is, it creates two separate views of final data and change on data.
1. Active Data:
Data stored in the active data component will always be current state of data. i.e. for a sales order DSO, a sales amount has been updated from 100 to 200, active data will have 200 as value.
2. Change Log:
It will contain all the changes and version of records. It can also be called as audit log table for active data.
3. Staging Engine:
It will be used to process data between change and maintaining active data and change log data.
Data extraction from DSO in SAP to Azure Data Factory:
There could be two scenarios to extract data from DSO
1. Master data extraction
Master data is slow changing data. Size of master data tables is relatively smaller as compared to transaction data. Therefore, approach will be to extract entire table and recreate in source system.
Figure 2: Data extraction using Query or Table Name
Data from SAP can be extracted by providing Table Name or by providing Query. Using query provides more flexibility as compared to using Table. Hence, it is preferred to use query to extract data.
Once data is extracted and stored in a blob location of ADLS, it can be further processed to recreate final table.
2. Transaction Data
Transaction data grows at rapid speed. Size of transactional data is significantly large. Therefore, extracting entire set of data from source to destination will consume significantly high time. Therefore, extraction from transaction table is a delta extraction approach.
Figure 3: Data extraction using Query or Table Name
Data from SAP can be extracted by providing Table Name or by providing Query. Using query provides more flexibility as compared to using Table. Hence, it is preferred to use query to extract data.
To extract data from DSO, Data is not extracted from active table but from change log table which will keep all the records which with key time stamps of created datetime and created by for all types of changes i.e. new record, updated records, deleted record.
In order to source incremental data, a master extraction table is maintained which will store pointer / reference to last extracted run. And every time, job is run, it will extract data from the last extraction date.
Figure 4: Query data for incremental load
Once incremental data has been sourced from the source system, based on the primary keys/unique keys, records will be upsert.
In this approach, data transfer is kept at minimum and job processing is optimized as well.