Saturday, January 7, 2017

A comparative analysis between SQL*LOADER and UTL_FILE utility.

A comparative analysis between  SQL*LOADER and UTL_FILE utility.



EXECUTIVE SUMMARY
In implementing new systems we come across problems of importing "alien" data.  This may be coming from a legacy system or an on-going system. This data is transported via extract files from the legacy system to the Oracle system. The gateway to Oracle for this data is SQL*Loader and data is loaded into tables via a control script into tables. Typically, the older systems do not have very normalized data, nor have they been operating with fully implemented database constraints.  The lack of constraints over the years in legacy system can lead to bad data that has crept in. Therefore, while bringing external data into oracle system we need a refined set of checks and balances to ensure that we get good data.  This requires a lot of programmatic control in the process of data-loading.  The approach applied in case of SQL* Loader is as follows :
1.  Load the data into temporary tables via SQL*Loader via control file and make the data  native to ORACLE.
2. Write a PL/SQL program to do the processing.
3. Load the data into live tables.
This approach has a lot of dependencies as well as a strong lack of integration of steps and programmatic control.  To overcome this, we have analyzed another facility in that has been release Oracle 7.3.x onwards.  It is called the UTL_FILE package.  With some creative use of this package we can achieve whatever SQL*LOADER offers and in addition to that do some high level validation and complex data loading.  In the following discussion a study of two tools is done with an example case for UTL_FILE.

A BRIEF OVERVIEW OF SQL*Loader
SQL*Loader is a server utility for loading data from external data files into Oracle database.  The basic advantage of using SQL*Loader is for simple loads and fast loading of data.  It can load data
into myraid data formats, perform elementary filtering, load data into multiple tables, and create one logical record from one or more physical records.  It creates a detailed log file, a bad file that contains rejected
records and a discard file to hold the records that are selectively not loaded.  The tool is executed from a command line and a username and password and the control file name and location are required to run it.
(See Figure1)

Typical syntax of a control file for loading data for SQL*Loader.

1. LOAD DATA
2. INFILE    file_name  /  *
3. INSERT/APPEND/REPLACE/TRUNCATE
4. CONTINUE IF THIS (1) =  character
5. INTO TABLE EMP
6. ( empid SEQUENCE (MAX,1),
7.     first_name Position (01:30) CHAR,
8.     last_name Position (31:60) CHAR,
9.     hire_date  Position  ( 61:72) DATE,
10.   emp_no    Position  ( 73:85) CHAR
11. INTO TABLE DEPT
12. WHEN DEPTNO!= '^Z'
13.  (emp_no  Position (73:85) CHAR,
14.   dept_no  Position (86:95) INTEGER_EXTERNAL)
15. BEGIN DATA     - - if you are using * on line 2.

Table A.

Line #
  Explanation
1
This line is the starting syntax for the control file.
2
This line gives the location of an input file.  IF a ‘*’ option is chosen then data is appended in the control file itself.
3
This line contains the DML operation we want to perform with SQL*Loader.
4
The character ( for instance, ‘+’ or ‘*’) specifies that next physical record should be appended till they find another character like that to for a single logical record.
5 & 11
These lines provide the table names where the records need to be inserted.
12
This line filters (does not load) the records where deptno = ^Z.
15
This line is used if one is using the ‘*’ option on the Line#  2


 




OUTPUT FILES.
There are three types of file that are written during an execution of SQL*Loader process. They are –

LOG FILE  - The log file gives the status of affairs as the SQL*loader program is executing and contains summary of data load execution.  It also has table load and summary statistics.

BAD FILE – The bad file contain records that fail the criteria in the control file or are rejected by the database.  The bad file is written in same format as that of the data file, so that rejected records can be reloaded with the same control file.

DISCARD FILE – The discard file is created only when specified or needed.  The file has records that failed the filtered criteria. Hence these records are discarded and not inserted. They are written as the same format as that of the data file.



EXECUTION- The command to execute SQL*Loader in a UNIX shell script is as follows.  The files to be supplied as shell script variables that make the script more generic.

sqlldr userid = ${LOGIN_CONVS}
       control = ${CTRL_FILE}
       log     = ${LOG_FILE}
       discard = ${DIS_FILE}
        bad    = ${BAD_FILE}
            data   = ${DAT_FILE}
            errors = 1000000
 # code to check the success of the invoking of sqlloader
   if [$ ? -ne 0]
   then echo "Error !  The sql loader call failed "
   exit 1
   fi



For a complex load we can use trigger logic to use PL/SQL programmatic control.  A point to note here is that if you want a very high-speed load you can use the direct path method that is faster than the conventional path method.  However, the clean up time required in the direct path load method later on offsets the benefits obtained by accelerated speed of load.


UTL_FILE -- PL/SQL does not have text file input output capabilities per se but acquires it via UTL_FILE package.  It provides rudimentary utility for reading ( as well as writing) files from  within a PL/SQL program.  The general outline of a process of reading a file is shown in Figure 2. The lines in the file are read sequentially and hence it effects the performance of the program.  The UTL_FILE package can be wrapped around with a PL/SQL program and since this package is integrated with PL/SQL it provides us the tremendous ability for flexing our "programming muscles." (See Figure 3) Some procedures and functions can be added to this wrapper program that serve as a handy "tool" for doing normal file reading operations.  With this approach we can achieve whatever SQL*Loader can do and much more. The security mechanism for UTL_FILE is achieved by defining a parameter in INIT<SID>.ora file called utl_file_dir parameter.  The directories that UTL_FILE can read from and write to need to have permissions of Oracle instance owner and the user running the package.





FIGURE 3       UTL_FILE wrapper ( tool programs for processing  records and inserting into database).


A NOTE ON utl_file_dir
This is a very important parameter that needs to be set before you can use the UTL_FILE package.  It controls the security of the UTL_FILE package.  This parameter should be added to the init<SID>.ora file. If one wants to write to, or read from numerous directories all of them should be listed in this parameter separated by commas or spaces. 
For  e.g. utl_file_dir = /home/oracle/inbound
IMPORTANT – The oracle instance must be brought down and restarted for the changes of init<SID>.ora file to be effective.

CASE STUDY OF UTL_FILE
A case study is developed on UTL_FILE to perform similar functions that are performed by SQL*Loader tool.  A set of PL/SQL procedures and functions are developed that can be used for performing the actions that SQL*Loader can perform and much more.  The approach here is on providing the technical insight into developing a schematic approach for loading data via UTL_FILE (figure 4) rather than writing PL/SQL package per se.



o-empnum                                  pic   x(11)
o-name                                      pic   x(60)
o-sex                                         pic   x(1)
o-ssn                                         pic   x(11)
o-hiredate                                   pic   x(9)
o-termdate                                  pic   x(9)
o-addressline1                             pic   x(35)
o-addressline2                             pic   x(35)
o-city                                        pic   x(35)
o-county                                            pic  x(30)
o-state                                       pic   x(2)
o-zipcode                                   pic   x(10)
o-dept                                        pic   x(3)







FILE FORMAT   (FIXED)
 


00000000001ALAN BAXTER, K                     M 228967890821011          701 Kentucky  Avenue         Toledo                     Lucas                            OH43606    123
.
.
.
.
.
.
.
.

FIGURE 4     FILE LAYOUTS AND FORMAT


CASE PROBLEM/OPPURTUNITY CONSTRUCT WITH A SOLUTION STRATEGY
The case problem is a “snapshot” of a typical conversion operation where you have to load a data into tables.  A data extract file is taken as input.  The layout of the file and positioning of the records is shown in figure 5.  On the database side this data needs be loaded into two tables.  In addition to this there is a table that is used for lookup and an error table that holds the disqualified records.  The mapping of the two systems (file and the database) is shown in figure 6.  The tools programs and main programs are developed calling the UTL_FILE package.  An approach towards handling and re-loading the error record is also included.  These programs are included in the following discussions. The complete picture of the approach is provided after the code for these functions (See figure 7). There are lines in the code where I have used statement beginning with ‘#’ outlining the actions to performed. For instance, #- Insert records. This has been done to write the entire insert statement but it is understood that these shorthand are very self-explanatory and are used to focus on the important points.
The approach can be summarized in the following steps –
1.Analysis of the datafile and its structure.
2.Mapping of the data elements
3.Description of the data objects storing transferred data.
4.Data validation and data massaging tool programs.
5.Main program using the tool programs for data load and writing records to error tables.
6.Error correction and reload of data.


CREATE TABLE EMP_DETAILS (                             
empno       VARCHAR2(11),
fullname    VARCHAR2(150),
firstname   VARCHAR2(30),
lastname    VARCHAR2(30),
middlenames VARCHAR2(30),
hiredate    DATE,
terminationdate  DATE,
Sex         VARCHAR2(5),
ssn                 VARCHAR2(30),
Costcenter  VARCHAR2(10));



CREATE TABLE EMP_ADDRESSES (
empno        VARCHAR2(11),
addressline1 VARCHAR2(50),
addressline2 VARCHAR2(50),
city         VARCHAR2(50),
county            VARCHAR2(50),
state        VARCHAR2(30),
zipcode      VARCHAR2(15),
Country      VARCHAR2(50));


CREATE TABLE EMP_DEPT_LOOKUPS (
value_id    NUMBER,
dept_no     NUMBER,
costcenter  VARCHAR2(20));


CREATE TABLE EMP_ERRORS (
empno       VARCHAR2(11),
fullname    VARCHAR2(150),
firstname   VARCHAR2(30),
lastname    VARCHAR2(30),
middlenames VARCHAR2(30),
hiredate     Date,
terminationdate Date,
Sex          Varchar2(5),
ssn                  VARCHAR2(30),
Costcenter   Varchar2(10),
addressline1 VARCHAR2(50),
addressline2 VARCHAR2(50),
city         VARCHAR2(50),
county            VARCHAR2(50),
state        VARCHAR2(30),
zipcode      VARCHAR2(15),
Country      VARCHAR2(50),
Error_reason VARCHAR2(200),
Status       VARCHAR2(15),
tabname         VARCHAR2(20));

FIGURE 5.  DESCRIPTION OF TABLES  INVOLVED











EMP_DETAILS
Empno
EMP_NUMBER
Firstname, Lastname, Middlenames
NAME
Sex
SEX
Ssn
SSN
Hiredate
HIREDATE
Termination
TERMINATIONDATE
Costcenter
DEPTNO

EMP_ADDRESSES
Empno
EMP_NUMBER
Addressline1
ADDRESSLINE1
Addressline2
ADDRESSLINE2
City
CITY
County
COUNTY
State
STATE
Zipcode
ZIPCODE



TOOL PROGRAMS
--/** This function returns the converted datatype for the extract data**/--
Function GET_ORACLE_DATE( ps_date in VARCHAR2)
RETURN DATE IS
oracle_formatted_date Date;
ps_date_format varchar2(8):='YYYYMMDD'; --Peoplesoft date format
BEGIN
oracle_formatted_date:=TO_DATE(TO_NUMBER(LTRIM(RTRIM(ps_date))),ps_date_format);
return oracle_formatted_Date;
EXCEPTION
When others then
#-- Error handler insert the record in the error table.
end;

/**Procedure to set employee name.  This proc breaks the name into middle name , first name and last name. The format of the coming name is lastname,firtname  middlename for e.g Shrivastava, Anunaya K.  This is returned with the help of variable and can be read inside the calling program **/

PROCEDURE SET_EMP_NAME
(name_in in varchar2,firstname_out in out Varchar2,
middlename_out in out varchar2,
lastname_out in out varchar2)
is
begin
lastname_out:= rtrim(substr(name_in,1,instr(name_in,',',1,1)-1));
middlename_out:=rtrim(substr(name_in,instr(name_in,',',1,1)+instr(substr(name_in,instr(name_in,',',1,1)+1,50),' ',1,1)+1));
Firstname_out:=rtrim(substr(name_in,instr(name_in,',',1,1)+1,instr(substr(name_in,instr(name_in,',',1,1)+1,50),' ',1,1)));
EXCEPTION
When others Then
#-- Error handler insert the record in the error table.
End ;

/* Function to get home center.  This function does a dynamic lookup and returns the corresponding cost center for the dept#*/

FUNCTION GET_COST_CENTER (in_dept_no in VARCHAR2)
RETURN varchar2 is
l_cost_center Varchar2(10);
BEGIN
SELECT costcenter into l_cost_center
FROM EMP_DEPT_LOOKUPS
WHERE dept_no = in_dept_no;
RETURN l_cost_center;
EXCEPTION
#-- Error handler-insert the rec in the error table.
end ;

-------/** The error handler inserts the records into the error tables with a reason and the source table so that they can be reloaded after correction **/-----------------------

Procedure error_handler ( err_emp_rec emp_details%rowtype, err_add_rec in emp_addresses%rowtype,
err_msg in emp_errors.error_reason%type,  err_table in emp_errors.tabname%type)
IS

Begin
 
   #--Insert_into_emp_errors---



end;

-----/** This procedure parses the input string into various records and fields**/--
PROCEDURE PARSE_REC (v_rec_line in Varchar2, v_rec  in out emp_details%rowtype, d_rec in out emp_addresses%rowtype)
IS
begin
--*** parse for details table **----------------------------------------------------------------
v_rec.empno:= SUBSTR(v_rec_line,1,11);
v_rec.fullname:=SUBSTR(v_rec_line,12,60);
v_rec.sex:=SUBSTR(v_rec_line,73,1);
v_rec.ssn:=SUBSTR(v_rec_line,74,11);

-------------------Translate the datatype and Get it into Oracle table use the tool function-------

v_rec.hiredate:=GET_ORACLE_DATE(SUBSTR(v_rec_line,86,9));
v_rec.terminationdate:=GET_ORACE_DATE(SUBSTR(v_rec_line,96,9));

----GET the firstname, lastname and middle name (data translation with functions)---------------

set_emp_name(v_rec.fullname, v_rec.firstname, v_rec.middlenames,v_rec.lastname);

------------------GET the cost center------------------------------------------------------------

v_rec.costcenter:=GET_COST_CENTER(substr(v_rec_line,262,3));


-----***parse for add table ***-------------------------------------------------------------------
d_rec.empno:=SUBSTR(v_rec_line,1,11);
d_rec.addressline1:=SUBSTR(v_rec_line,106,35);
d_rec.addressline2:=SUBSTR(v_rec_line,142,35);
d_rec.city :=SUBSTR(v_rec_line,178,35);
d_rec.county:=SUBSTR(v_rec_line,214,30);
d_rec.state:=SUBSTR(v_rec_line,245,2);
d_rec.zipcode:=SUBSTR(v_rec_line,248,10);
d_rec.Country:=SUBSTR(v_rec_line,259,3);

EXCEPTION
--# call the Error_handler procedure for erring record and insert into the error table

end;

/*Insert into tables only one sample program is shown here, the same logic can be applied for other tables like emp_address and emp_add_errors insertion*/
Procedure insert_emp_details ( grec in emp_details%rowtype) As
begin
INSERT INTO EMP_DETAILS
(empno,
fullname,
firstname,
lastname,
middlenames,
hiredate,
terminationdate,
Sex,
ssn,
Costcenter)
VALUES
(grec.empno,
grec.fullname,
grec.firstname,
grec.lastname,
grec.middlenames,
grec.hiredate,
grec.terminationdate,
grec.sex,
grec.ssn,
grec.costcenter);
end;


MAIN PROGRAM
PROCEDURE LOAD_DATA ( loc in varchar2, file in varchar2)
IS
emp_det_rec emp_details%rowtype;
emp_add_rec emp_addresses%rowtype;


------UTL_FILE related variables----

file_handle UTL_FILE.FILE_TYPE;
data_line   Varchar2(1023);

-----------------------------------

BEGIN
-- open the file in a read mode-------
file_handle := UTL_FILE.FOPEN(loc,file,'R');
---GEt the lines in the loop and do the processing--

 LOOP
   begin
                 
                  UTL_FILE.GET_LINE(file_handle, data_line);
                  data_line:= rtrim(ltrim(data_line));
                 
  ----parse the records and load them into out rec variables( tool program proc parse_rec)---
                 
                  parse_rec(data_line,emp_det_rec,emp_add_rec);
    
  ----Load the data into respective tables ( tool program of inserting into emp tables)---
                                    ----Insert into emp details------
      Insert_into_emp_details ( emp_det_rec);
        ---------------------------------------------------------
                                    ---Insert into emp address-------
       Insert_into_emp_addresses(emp_add_rec);

                  ---------------------------------------------------------
    EXCEPTION
                  WHEN NO_DATA_FOUND then EXIT;
    end;
  END LOOP;
      ---UTL_FILE CLOSE----
     UTL_FILE.FCLOSE(file_handle);

  EXCEPTION

   WHEN UTL_FILE.INVALID_PATH then
                  UTL_FILE.FCLOSE(file_handle);
    dbms_output.put_line('Invalid path for the file');

  WHEN UTL_FILE.INVALID_MODE then
        UTL_FILE.FCLOSE(file_handle);
    dbms_output.put_line('Invalid mode for the file');

  WHEN UTL_FILE.INVALID_FILEHANDLE then
         UTL_FILE.FCLOSE(file_handle);
   dbms_output.put_line('Invalid file handle ');
 
  WHEN UTL_FILE.READ_ERROR then
         UTL_FILE.FCLOSE(File_handle);
   dbms_output.put_line('Read error for the file');

  WHEN OTHERS then
        UTL_FILE.FCLOSE(file_handle);
   dbms_output.put_line('Error in Load data procedure');
 
  END;

RELOAD PROGRAM
/* This procedure reloads the records after correction.  This correction can be done with a small user interface that can be developed in Developer2000 Forms4.5 and then the status be changed for those records from the user interface */
Procedure Reload_correct_recs  IS
Cursor correct_rec is
SELECT *
FROM emp_errors
WHERE status ='CORRECTED'
For Update of Status;

BEGIN
  For i in correct_rec LOOP
     If UPPER(i.tabname) = 'EMP_DETAILS'
       then
           #insert the records in Emp_details with appropriate datatypes
     elsif UPPER(i.tabname) = 'EMP_ADDRESSES'
       then
                     # insert the records in Emp_addresseswith appropriate datatypes
     elsif UPPER(i.tabname) ='BOTH'
        then
          #insert the records in Emp_details with appropriate datatypes
          #insert the records in Emp_addresses with appropriate datatypes

      End if;

    Update emp_errors
    set status = 'RELOADED'
    WHERE current of correct_rec;

  END LOOP;

end;




TABLE B


Parameter
 SQL*Loader
             UTL_FILE
Backward compatibility
High.
Mature and stable product
     Low
Is applicable from Oracle 7.3.onwards
Security
High
Database level security
   Low
Has to achieved by utl_file_dir
Integration with
PL/SQL
Very Low
Very difficult to integrate
Very high
Seamless integration with PL/SQL
Performance
High speed loading
Good for very high volume of data as in conversion
Low speed loading
Good for low volume of data as in an
Interface
Updates while
Loading
Cannot perform the updates
Can perform updates
Enforce referential integrity
Low
Very hard to implement
Easy to implement
Data transformation
Hard to do a conditional transformation
Easy to do a data transformation
Complex data
Load conditions
Difficult to handle
Easy to handle with the PL/SLQ programmatic control
Error Correction
Bad file need re-editing through some editor or programming effort via Sed ,awk , or Perl.
An easy user-interface can be developed in Forms 4.5 to edit the records from the error tables.
Record length
Length of a line of record can be more than 1022 characters
Length of a line of a  record cannot be more than 1022 characters
File operations
Does not have any direct file operations capabilities but can be done with the Unix Shell script
UTL_FILE is integrated with PL/SQL and has limited file operations capabilities


CONCLUSIONS
The comparative analysis of SQL*Loader and UTL_FILE reveals that these tools are suitable to your environment subject to the conditions of your needs.  If the data load is complex (as is the case in relational databases) UTL_FILE seems to be the tool of choice.  This tool does require programmatic effort in terms of writing a wrapper package but the subsequent effort in this direction is greatly reduced once the initial tool kit is built for your environment.  The UTL_FILE tightly integrates the file input with the programmatic control and the data manipulation inside a single PL/SQL unit.  There are disadvantages of speed in loading in case of UTL_FILE but these are totally offset by the programmatic control it offers and the integration it brings in.  Thus we find that UTL_FILE tool bridges the gap left by SQL*Loader for complex data loads.








No comments:

Post a Comment

Types of animations

Animation is a diverse and creative field with a variety of styles that artists and filmmakers use to bring their visions to life. Here are ...