Data Manipulation Using Java in Oracle

Steve Callan, stevencallan@hotmail.com

 

Is Oracle always the best tool or application to manipulate data? The answer lies within what you mean by manipulate. Data can be transformed via data manipulation language, and it can also be transformed via string or numerical manipulation. To distinguish between the types of manipulation, let’s say that stored data is in the purview of DML, and output or displayed data is what or how stored data is presented to the user. A simple example of this is a date. What is stored can be forced to display in numerous ways. March 8th, 2007 is just as valid an output as 03/08/2007; it just depends on what you or your users want to see.

The point of this article concerns how you get from what is stored to what is displayed. The construct for this point revolves around a string/numeric manipulation problem. Suppose you have an application that supports surveys or offers a coupon to users via the Internet. After clicking Submit on an online survey form, it is very common for your identifier, whether it is a survey ID or an account number, to be passed back as one of many name-value pairs in a URL.

Another use of this “pass back data” transport mechanism is to validate or limit your use of a promotional coupon. Coupons, whether paper or electronic in form, are typically represented in a numeric format, and quite often, as very large numbers. The very large number aspect makes it hard to counterfeit or fake numbers since the density of numbers can be made to be sparse. Issue one million coupons with barcode numbers ranging from 1,000,000 to 1,999,999 and you have a density of one. Using a range of 77,000,000,000 to 78,000,000,000 and finding a valid coupon number just became the equivalent of searching for a needle in the haystack.

The problem or constraint, however, is that passing back a 22-digit barcode number consumes valuable space or length in the URL (using the 255 or 256 character limit). What we need is a way to shorten up the big number into something a bit smaller/shorter. The method is what leads us into the “how” of how are we going to accomplish this.

One method is to change the base of the number. If you take a base-10 number and convert it into a base-36 number, you save some space. Using Oracle, how would you perform the conversion?

A PL/SQL approach

Let’s look at a PL/SQL approach first. The code has been optimized a bit, in that the factorization has a jump-start in terms of position or iterations already taken into account. It is also based on the input being close to at least 22 digits long.

CREATE OR REPLACE FUNCTION gen_barcode36 (i_coupon_number NUMBER) 
RETURN VARCHAR2 IS
     v_number    number;
     v_curval    number;
     v_curinc    number := power(36,14);
     v_true      number := 0 ;
     v_pos       number := 15;
     v_dec_count number := 0 ;
     v_pos_val   number := 0 ;
     v_cur_num   number := 0 ;
     v_test_num  number := 0 ;
     v_new_num   number := 0 ;
     v_cur_digit char := '';
     v_new_val   varchar2(15) := '';
   BEGIN
     v_number := i_coupon_number;
     while ( v_true = 0 ) loop
         v_curinc := v_curinc * 36;
         v_curval := v_number / v_curinc;
         if ( v_curval < 1 ) then
            v_true := 1;
         else
            v_pos := v_pos + 1;
         end if;
     end loop;
     v_dec_count := v_pos;
     v_new_val := NULL;
     v_cur_num := v_number;
     WHILE ( v_dec_count > 0) LOOP
        v_pos_val := power(36,v_dec_count - 1);
        v_test_num := trunc(v_cur_num/v_pos_val);
        select decode(v_test_num,35,'z', 34,'y', 33,'x',
                                 32,'w', 31,'v', 30,'u',
                                 29,'t', 28,'s', 27,'r',
                                 26,'q', 25,'p', 24,'o',
                                 23,'n', 22,'m', 21,'l',
                                 20,'k', 19,'j', 18,'i',
                                 17,'h', 16,'g', 15,'f',
                                 14,'e', 13,'d', 12,'c',
                                 11,'b', 10,'a',  9,'9',
                                  8,'8',  7,'7',  6,'6',
                                  5,'5',  4,'4',  3,'3',
                                  2,'2',  1,'1',    '0')
               INTO v_cur_digit FROM DUAL;
        IF ( v_new_val IS NOT NULL ) THEN
            v_new_val := v_new_val || v_cur_digit;
        ELSE
            v_new_val := v_cur_digit;
        END IF;
        v_cur_num := v_cur_num - (v_pos_val * v_test_num);
        v_dec_count := v_dec_count - 1;
     END LOOP;
     RETURN v_new_val;
   END gen_barcode36;
/

A different version of this has some output attached to it so you can see how the number is reduced.

SQL> select gen_barcode36(7700000000000000000000) from dual;
 
GEN_BARCODE36(7700000000000000000000)
-----------------------------------------------------
1950zn8fqxjygow
 
Starting with 7700000000000000000000
Position: 6140942214464815497216 current number: 1
Position 14 current value: 1
Position: 170581728179578208256 current number: 9
Position 13 current value: 19
Position: 4738381338321616896 current number: 5
Position 12 current value: 195
Position: 131621703842267136 current number: 0
Position 11 current value: 1950
Position: 3656158440062976 current number: 35
Position 10 current value: 1950z
Position: 101559956668416 current number: 23
Position 9 current value: 1950zn
Position: 2821109907456 current number: 8
Position 8 current value: 1950zn8
Position: 78364164096 current number: 15
Position 7 current value: 1950zn8f
Position: 2176782336 current number: 26
Position 6 current value: 1950zn8fq
Position: 60466176 current number: 33
Position 5 current value: 1950zn8fqx
Position: 1679616 current number: 19
Position 4 current value: 1950zn8fqxj
Position: 46656 current number: 34
Position 3 current value: 1950zn8fqxjy
Position: 1296 current number: 16
Position 2 current value: 1950zn8fqxjyg
Position: 36 current number: 24
Position 1 current value: 1950zn8fqxjygo
Position: 1 current number: 32
Position 0 current value: 1950zn8fqxjygow
Final value: 1950zn8fqxjygow
 

This isn’t meant to be security through obscurity, although a relatively strange looking string will discourage most users from trying to figure out what it represents.

One at a time, the data manipulation performance is okay, but how long would it take to generate a million converted values? Let’s make a table with three columns. The first is the base-10 number, the second is the base-36 converted/manipulated value using the PL/SQL code above, and the third is reserved for the output using a different means to manipulate the numbers. Create the table and populate it with a million records. Turn timing on and see how long it takes to generate the base-36 values.

SQL> create table base36
  2  (barcode number,
  3   plsql_ver varchar2(15),
  4   other_ver varchar2(15));
 
Table created.
 
SQL> create sequence barcode_seq start with 7700000000000000000000;
 
Sequence created.
 
SQL> begin
  2    for i in 1..1000000 loop
  3      insert into base36 (barcode) values (barcode_seq.nextval);
  4    end loop;
  5  end;
  6  /
 
PL/SQL procedure successfully completed.
 
SQL> commit;
 
Commit complete.
 
SQL> set timing on
SQL> update base36 set plsql_ver = gen_barcode36(barcode);
 
1000000 rows updated.
 
Elapsed: 00:33:09.01
 

The rate works out to be just over 500 updates per second.

Might there be a faster way outside of PL/SQL to accomplish this task? This problem is a good example of when to use a better language, so to speak, than PL/SQL to transform data. A better language or approach in this case is to manipulate the string using Java and a built-in method (toString).

How do you get Java code into a database?

First, we need to have a java file with the applicable Java code in it. Second, is to upload the code (either the source or compiled version) into the database. Third is to compile the code, and then the last step is to create a wrapper function or procedure around the Java code. The function calls the Oracle-named object, which in turn calls the Java-named object.

The source file/Java code is pretty straightforward.

import java.math.BigInteger;
import java.lang.String;
 
public class BCUtils
{
  public static String getBarcode36(String barcode10)
  {
    String value = new String(barcode10);
    BigInteger bigI = new BigInteger(value.toString());
    StringBuffer result =  new StringBuffer(bigI.toString(36));
    return result.toString();
  }
}
 

Load the source file and compile it.

Create a wrapper function/publish the class.

SQL> create or replace function get_bc36(bc_10 varchar2) 
  2  return varchar2
  3  as language java name 
  4  'BCUtils.getBarcode36(java.lang.String) return java.lang.String';
  5  /
 
Function created.
 

Now we’re ready to test the difference.

SQL> update base36 set other_ver = get_bc36(barcode);
 
1000000 rows updated.
 
Elapsed: 00:13:56.22
 

The rate above works out to almost 1200 per second, well more than twice as fast as the PL/SQL version. Why is that? Oracle acknowledges that PL/SQL isn’t the fastest language in the world or the best at string manipulation.

A more formal list of steps is shown in the section titled “Java Stored Procedure Steps” in the Java Developer’s Guide.

In Closing

PL/SQL can do many things well, but other languages can do some things much better, with “better” being measured in terms of speed. If you find that stored data must be manipulated for display type output, don’t be afraid to try other languages supported by Oracle.