Oracle 10g: Exploring Data Pump

Sreeram Surapaneni, svsreeram@yahoo.com

Introduction:

Oracle 10g offers several new features, one of which is Data Pump technology for fast data movement between databases. Most Oracle shops still use their traditional export and import utility scripts rather this new technology. Data Pump technology is entirely different from the export/import utility, although they have a similar look and feel. Data Pump runs inside the database as a job, which means jobs are somewhat independent of the process that started the import or export. Another advantage is that other DBAs can login to the database and check the status of the job. The advantages of Data Pump, along with Oracle's plan to deprecate the traditional import/export utilities down the road, make Data Pump a worthwhile topic for discussion.

Oracle claims Data Pump offers a transfer of data and metadata at twice the speed of export and twenty to thirty times the speed of the import utility that DBAs have been using for years. Data Pump manages this speed with multiple parallel streams of data to achieve maximum throughput. Please note that Data Pump does not work with utilities older than the 10g release 1 utility.

Data Pump consists of two components: the Data Pump export utility called “expdp,” to Export the objects from a database, and the Data Pump Import utility called “impdp,” to load the objects into database. Just like traditional export and import utilities, the DBA can control these jobs with several parameters.

For example:

$expdp username/password (other parameters here)
$impdp username/password (other parameters here)

We can get a quick summary of all parameters and commands by simply issuing

$expdp help=y
$impdp help=y

Similar to the export and import utility, Data Pump export and import utilities are extremely useful for migrating especially large databases from an operating system and importing them into a database running on a different platform and operating system in a short amount of time.

The Oracle supplied package, DBMS_DATAPUMP, can be used to implement the API, through which you can access the Data Pump export and import utilities programmatically. In other words, we can create a much powerful, custom Data Pump utility using Data Pump technology, if you have hundreds of databases to manage.

One of the interesting points is how Data Pump initiates the export session. In the traditional export utility, the user process writes the exported data to the disk that is requested from the server process, as a part of regular session. The Data Pump expdp - user process launches a server-side process or job that writes data to disks on the server node, and this process runs independently of the session established by expdp client. However, similar to the traditional export utility, Data Pump writes the data into dump files in an Oracle proprietary format that only the Data Pump import utility can understand.

New Features of Data Dump that improve the performance of Data movement:

Below are some of the features that differentiate the traditional export and import utility from Data Pump. These features not only enhance the speed of the data transfer but also are handy for the DBA to asses how the job would run before actually running Data Dump.

  • Parallel Threads: The parallel parameter specifies the maximum number of threads of active execution operating on behalf of the export job. This execution set consists of a combination of worker processes and parallel I/O server processes. Because each active worker processes or I/O server process works on one active file at a time, the DBA must specify a sufficient number of files. Therefore, the value the DBA specifies for this parameter should be less than or equal to the number of files in the dump file set. This important parameter helps the DBA to make a trade-off between resource consummation and the elapsed time.
  • Ability to restart the job: The ability to restart a job is an extremely useful feature if DBA is involved in moving large amounts of data. The Data Pump job can be restarted without any data loss or corruption after unexpected failure or if the DBA stopped the job with stop_job parameter.
  • Ability to detach from and reattach the job: This allows other DBAs to monitor jobs from multiple locations. We can attach the Data Pump export and import utilities to one job at a time but we can have multiple clients attached to the same job.
  • Support for export and import operations over the network: The NETWORK_LINK parameter initiates an export using a database link. It means that the system, to which expdp is connected, contacts the source database referenced by the source_database_link, retrieves data from it and writes the data to a dump file set back on the connected system.
  • Ability to change the name of source datafile to a different name: The DBA can change the name of the source datafile to a different name in all DDL statements where the source datafile is referenced.
  • Support for filtering the metadata: The DBA can filter metadata using the “EXCLUDE” and “INCLUDE” options. If the object is excluded, all of its dependent objects are also excluded. For example, EXCLUDE=CONSTRAINT will exclude all constraints except “NOT NULL” and constraints needed for table creation, which includes: INCLUDE=TABLE:"IN('EMPLOYEES','DEPARTMENTS')"

·  Space Estimate: The DBA can estimate how much space an export job will consume, without actually performing the export.

·  Query Parameter: The DBA can filter data during the export by specifying a clause for a “SELECT” statement.

·  Content Parameter: The DBA can specify what is exported or imported, for example, Meta data only or data only or both.

Init.ora parameters that affect the performance of Data Pump:

Oracle recommends the following settings to improve performance.

Disk_Asynch_io= true

Db_block_checking=false

Db_block_checksum=false

Additionally, the number of processes and sessions allowed to the database must be set to high, to allow for maximum parallelism.

How Data Pump accesses loading and unloading of Data:

Oracle has provided direct path to unload or export operations since Oracle 7.3. This method has been very useful for DBAs that want a quick export of the database and this process has been further enhanced in the Data Pump technology. Oracle uses the direct path method for loading (impdp) and unloading (expdp) when the structure of the tables allows it. If the table is part of a cluster, or it has a global index on a partitioned table, then Data Pump accesses the  data in a different method called External Table. Both the direct path load and external table method support the same external data representation, so we can load the data that was unloaded with External Table method and vice versa.

Getting Started...

As stated earlier, Data Pump is a server-based utility, rather than client-based; dump files, log files, and SQL files are accessed relative to server-based directory paths. Data Pump requires you to specify directory paths as directory objects. A directory object maps a name to a directory path on the file system.

1. The following SQL statements creates a user, a directory object named dpump_dir1 and grants the permissions to the user.

SQLPLUS system/manager@TDB10G as sysdba 
 
SQL> create user dpuser identified by dpuser;
 
SQL> grant connect, resource to dpuser;
 
SQL> CREATE DIRECTORY dpump_dir1 AS '/opt/app/oracle';
 
SQL> grant read, write on directory dpump_dir1 to dpuser
 

2. Let us see how the INCLUDE and EXCLUDE parameters can be used to limit the load and unload of particular objects. When the INCLUDE parameter is used, only the objects specified by it will be included in the export. When the EXCLUDE parameter is used, all objects except those specified by it will be included in the export: Assume we have EMP,EMP_DETAILS and DEPT tables owned by dpuser.

$ expdp dpuser/dpuser@TDB10G schemas=dpuser 
include= TABLE:\"IN (\'EMP\', \'DEPT\')\" 
  directory=dpump_dir1 dumpfile=dpuser.dmp log=dpuser.log
 
$expdp dpuser/dpuser@TDB10G schemas=dpuser 
exclude=TABLE:\"= \'EMP_DETAILS\'\" 
  directory=dpump_dir1 dumpfile=dpuser2.dmp logfile=dpuser.log

As stated earlier, Data pump performance can be significantly improved by using the PARALLEL parameter. This should be used in conjunction with the "%U" wildcard in the DUMPFILE parameter to allow multiple dumpfiles to be created or read:

$expdp dpuser/dpuser@TDB10G schemas=dpuser 
   directory=dpump_dir1 parallel=4 dumpfile=dpuser_%U.dmp logfile=dpuser.log

Data Pump API:

The Data Pump API, DBMS_DATAPUMP, provides a high-speed mechanism to move the data from one database to another. Infact, the Data Pump Export and Data Pump Import utilities are based on the Data Pump API. The structure used in the client interface of this API is a job handle. Job handle can be created using the OPEN or ATTACH function of the DBMS_DATAPUMP package. Other DBA sessions can attach to a job to monitor and control its progress so that remote DBA can monitor the job that was scheduled by an on-site DBA.

The following steps list the basic activities involved in using Data Pump API.

1.      Execute DBMS_DATAPUMP.OPEN procedure to create job.

2.      Define parameters for the job like adding file and filters etc.

3.      Start the job.

4.      Optionally monitor the job until it completes.

5.      Optionally detach from job and attach at later time.

6.      Optionally, stop the job

7.      Restart the job that was stopped.

Example of the above steps:

 
Declare
 
P_handle number; --- -- Data Pump job handle
P_last_job_state varchar2(45);  ---- -- To keep track of job state
P_job_state         varchar2(45);
P_status  ku$_Status ----- -- The status object returned by get_status
 
BEGIN
 
P_handle:=DBMS_DATAPUMP.OPEN ('EXPORT','SCHEMA', NULL,'EXAMPLE','LATEST');
 
-- Specify a single dump file for the job (using the handle just returned)
-- and a directory object, which must already be defined and accessible
-- to the user running this procedure
 
DBMS_DATAPUMP.ADD_FILE (p_handle,'example.dmp','DMPDIR');
 
-- A metadata filter is used to specify the schema that will be exported.
 
DBMS_DATAPUMP.METADATA_FILTER (p_handle,'SCHEMA_EXPR','IN (''dpuser'')');
 
-- Start the job. An exception will be generated if something is not set up
-- Properly.
 
DBMS_DATAPUMP.start_job (p_handle);
 
----The export job should now be running.

The status of the job can be checked by writing a separate procedure and capturing the errors and status until it is completed. Overall job status can also be obtained by querying “SELECT * from dba_datapump_jobs”.

Conclusion:

Oracle Data Pump is a great tool for the fast movement of data between the databases and much of this performance improvement is derived from the use of parameter “parallelism.” Even when the Transportable Tablespace feature of Oracle is used to move self-contained data between the databases, Data Pump is still required for handling the extraction and recreation of the metadata for that tablespace. Whenever possible, Data Pump performance is further maximized by using Direct-Path driver. Otherwise, Data Pump accesses the data using an External Table access driver.Data Pump provides flexibility, with the implementation of parameters such as INCLUDE, EXCLUDE, QUERY, and TRANSFORM that gives the DBA more control of data and objects being loaded and unloaded. With all of these features, Data Pump is a welcome addition to DBA tools in a world that constantly redefines the size of the “large database”.