The Oracle Instructor

Uwe Hesse about Oracle Core Technology

Posts Tagged ‘Data Warehouse

Result Cache: Another brilliant 11g New Feature

with 11 comments

I have just finished an Oracle Database 11g New Features course in Vienna. There are many amazing New Features in the 11g version, one of them is the possibility to cache the result sets of statements, that access large tables but return relatively few rows. Think of it like automagically created materialized views inside the SGA. Example:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL> select bytes/1024/1024 as mb from user_segments where segment_name='SALES';

 MB
----------
 560
SQL> alter table sales result_cache (mode force);
Table altered.

The ALTER TABLE statement is a new feature of 11g Release 2. In Release 1, you can control the feature only with the following parameters:

SQL> show parameter result_cache_m
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
result_cache_max_result              integer     5
result_cache_max_size                big integer 2080K
result_cache_mode                    string      MANUAL

Another possibility, already introduced in 11g R1 is the RESULT_CACHE hint. Now let’s look at the effect of the ALTER TABLE statement:

 

SQL> set timing on
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        421541477
 4        219308832
 3        926004170
 9       4438820.16
Elapsed: 00:00:03.72

This was the first time, the sales table was accessed after the ALTER TABLE above. The runtime signalizes we have got a full table scan here (there are no indexes on the table anyway). Of course, blocks of the table are now cached in the database buffer cache – as in previous versions. But now, also the result set is cached!

SQL> select channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        421541477
 4        219308832
 3        926004170
 9       4438820.16
Elapsed: 00:00:00.01

That is obvious by runtime already, so I ommit AUTOTRACE here. If the table gets changed, the result set gets “stale”, similar like a materialized view would:

SQL> update sales set amount_sold=1 where rownum<2;
1 row updated.
Elapsed: 00:00:00.02
SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        421541477
 4        219308832
 3        926002938
 9       4438820.16
 
Elapsed: 00:00:03.08

Second access after the DML will again use the (newly cached) result set – even if the statement is slightly different, in the same way as materialized views can be used for query rewrite, even if the SELECT differs from the query that built the materialized view. That gives me the opportunity to introduce the new SUPERFAST hint :-)

SQL> select /*+ superfast */ channel_id,sum(amount_sold) from sales group by channel_id;
CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        421541477
 4        219308832
 3        926002938
 9       4438820.16
Elapsed: 00:00:00.00

The “hint” is just a strange commentar for the engine and is simply ignored, of course. The speed comes from using the cached result set, stored in the SGA by the previous SELECT with the 3 seconds runtime.

Written by Uwe Hesse

November 27, 2009 at 15:19

Automatic DOP in 11gR2

with 4 comments

We have a probably very needful new feature introduced in 11g Release 2, related to parallel query: Automatically determined Degree of Parallelism (DOP). In earlier versions of the Oracle Database, we had to determine the DOP more or less manually, either with a parallel hint or by setting a parallel degree with alter table:

select /*+ parallel (sales,2) */ * from sales;

or

alter table sales parallel 2;

There was an automatic computation of the DOP available, derived from the simple formula CPU_COUNT * PARALLEL_THREADS_PER_CPU. That is what’s done internally if we would have said

select /*+ parallel (sales) */ * from sales;

or

alter table sales parallel;

The drawback with these approaches was always, that we could hardly be sure, whether the DOP is appropriate or not for the table, the statement and the hardware, we are running on. It was mostly a case of try & error. Especially problematic was the alter table approach, as this leads to the parallelization of each and every following select on those tables, even if totally inappropriate. A popular pitfall is the creation of tables with a parallel clause on OLTP-systems, because those tables inherit the parallel degree of their creation, which leads to parallel query for every statement accessing the tables afterwards – most likely not desirable for OLTP. In so far, parallelization was quite dumb (from a system-internal perspective) in versions before 11gR2. Now to the new feature:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select bytes/1024/1024 as mb from user_segments where segment_name='SALES';

 MB
----------
 563

SQL> select degree from user_tables where table_name='SALES';

DEGREE
----------------------------------------
 1

For my tiny machine, this table is huge. I allow automatic determination of the DOP with the following new dynamic parameter:

SQL> alter session set parallel_degree_policy=auto;

Session altered.

SQL> set autotrace on explain

SQL> select sum(amount_sold) from sales

SUM(AMOUNT_SOLD)
----------------
 1571293299

Execution Plan
----------------------------------------------------------
Plan hash value: 3130505568

-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |     5 | 10798   (1)| 00:02:10 |
|   1 |  SORT AGGREGATE        |          |     1 |     5 |            |          |
|   2 |   PX COORDINATOR       |          |       |       |            |          |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     5 |            |          |
|   4 |     SORT AGGREGATE     |          |     1 |     5 |            |          |
|   5 |      PX BLOCK ITERATOR |          |    14M|    70M| 10798   (1)| 00:02:10 |
|   6 |       TABLE ACCESS FULL| SALES    |    14M|    70M| 10798   (1)| 00:02:10 |
-----------------------------------------------------------------------------------

Note
-----
 - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit

I have got a moderate DOP, most likely appropriate for my relatively weak hardware, but still speeding up the query on the relatively big table. I could always override the automatic DOP determination by specifying a parallel hint as in earlier versions. Also, the parameter defaults to manual, so unless we change it, automatic parallelization will not take place. In order to demonstrate the quite intelligent computation of the DOP, compared to ealier versions, i will access another, much smaller table in the same session:

SQL> select count(*) from customers;

 COUNT(*)
----------
 30501

Execution Plan
----------------------------------------------------------
Plan hash value: 296924608

------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |   218   (0)| 00:00:03 |
|   1 |  SORT AGGREGATE    |           |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CUSTOMERS | 30501 |   218   (0)| 00:00:03 |
------------------------------------------------------------------------

Note
-----
 - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

As you can see, in this case, the system does not think that parallelization is appropriate for the select. So it is much smarter than the old force parallel query:

SQL> alter session set parallel_degree_policy=manual;

Session altered.

SQL> alter session force parallel query;

Session altered.

SQL>  select count(*) from customers;

 COUNT(*)
----------
 30501

Execution Plan
----------------------------------------------------------
Plan hash value: 1221513835

----------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |     1 |   121   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE        |           |     1 |            |          |
|   2 |   PX COORDINATOR       |           |       |            |          |
|   3 |    PX SEND QC (RANDOM) | :TQ10000  |     1 |            |          |
|   4 |     SORT AGGREGATE     |           |     1 |            |          |
|   5 |      PX BLOCK ITERATOR |           | 30501 |   121   (0)| 00:00:02 |
|   6 |       TABLE ACCESS FULL| CUSTOMERS | 30501 |   121   (0)| 00:00:02 |
----------------------------------------------------------------------------

Also, we have a remedy now against inappropriate parallel degrees on tables:

SQL> alter session enable parallel query -- the default, no force;
Session altered.
SQL> alter table customers parallel -- would cause parallel query before 11gR2;
Table altered.
SQL> set autotrace on explain
SQL> select count(*) from customers;
 COUNT(*)
----------
 30501
Execution Plan
----------------------------------------------------------
Plan hash value: 296924608
------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |     1 |   218   (0)| 00:00:03 |
|   1 |  SORT AGGREGATE    |           |     1 |            |          |
|   2 |   TABLE ACCESS FULL| CUSTOMERS | 30501 |   218   (0)| 00:00:03 |
------------------------------------------------------------------------
Note
-----
 - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

Written by Uwe Hesse

November 24, 2009 at 19:22

Brief introduction into Materialized Views

with 5 comments

This week, I am teaching an Oracle Database 11g Data Warehouse Administration course in Munich. One of the focus areas of that course are Materialized Views, and I have developed some examples for that course that I like to share with the Oracle community. Other themes of that four days course are Partitioning, ETL and Parallelization.

Materialized Views have the ability to speed up queries (even dramatically) while being transparent for the queries in a similar way as indexes resp. partitioned tables are. That means that we do not have to modify our queries in order to benefit from these structures. Unlike an ordinary view which is only a stored select statement that runs if we use the view, a materialized view stores the result set of the select statement as a container table. Let me demonstrate the benefit of that. I have a (for my little database) relatively large table and do an aggregation query on it:

SQL> desc sales
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 PROD_ID                                   NOT NULL NUMBER
 CUST_ID                                   NOT NULL NUMBER
 TIME_ID                                   NOT NULL DATE
 CHANNEL_ID                                NOT NULL NUMBER
 PROMO_ID                                  NOT NULL NUMBER
 QUANTITY_SOLD                             NOT NULL NUMBER(10,2)
 AMOUNT_SOLD                               NOT NULL NUMBER(10,2)

SQL> select count(*) from sales;

 COUNT(*)
----------
 7350744

SQL> select bytes/1024/1024 as mb from user_segments where segment_name='SALES';

 MB
----------
 286

SQL> show sga

Total System Global Area  373293056 bytes
Fixed Size                  1219496 bytes
Variable Size             134218840 bytes
Database Buffers          234881024 bytes
Redo Buffers                2973696 bytes

SQL> set timing on
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002085
 9       2219410.08

Elapsed: 00:00:04.51

As you can see, the aggregation takes round 5 seconds. Now I will speed that up using a Materialized View:

SQL> create materialized view mv1 enable query rewrite
 2  as select channel_id,sum(amount_sold) from sales group by channel_id;

Materialized view created.

Elapsed: 00:00:05.69
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002085
 9       2219410.08

Elapsed: 00:00:00.01

The very same statement now takes way less time! Why is that so?

SQL> set autotrace on explain
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002085
 9       2219410.08

Execution Plan
----------------------------------------------------------
Plan hash value: 2958490228
--------------------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     4 |   104 | 3   (0)| 00:00:01
|   1 |  MAT_VIEW REWRITE ACCESS FULL| MV1  |     4 |   104 | 3   (0)| 00:00:01
--------------------------------------------------------------------------------

Because the optimizer transparently rewrote the query on the Materialized View! But what if we now change the content of the base table sales?

SQL> update sales set amount_sold=amount_sold+1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.

The materialized view is now stale and will no longer be used for query rewrite (as we can already determine by query runtime):

SQL> select mview_name,staleness from user_mviews;

MVIEW_NAME                     STALENESS
------------------------------ -------------------
MV1                            NEEDS_COMPILE

SQL> set timing on
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002086
 9       2219410.08

Elapsed: 00:00:04.52

In order to get statements rewritten against the materialized view again, we must refresh it by some method.

One method is on demand with a procedure call like in this example:

SQL> exec dbms_mview.refresh(list=>'MV1',method=>'C')

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.62

SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
 2        210770739
 4        109654416
 3        463002086
 9       2219410.08

Elapsed: 00:00:00.01

Now the MV1 can get used again, as we see by runtime. That was a complete refresh – which can take a long time. We would like to get this done faster.

One comfortable method to get a fast refreshable materialized view is the usage of the package DBMS_ADVISOR:

SQL> vari t varchar2(50)
SQL> begin
 2   dbms_advisor.tune_mview(task_name=>:t,
 3                           mv_create_stmt=>'create materialized view mv1'
 4                               || ' refresh fast as'
 5                               || ' select channel_id,sum(amount_sold)'
 6                               || ' from sales group by channel_id');
 7  end;
 8/  
PL/SQL procedure successfully completed.

SQL> set long 5000
SQL> select statement from user_tune_mview where task_name=:t order by action_id;

STATEMENT
--------------------------------------------------------------------------------
CREATE MATERIALIZED VIEW LOG ON "ADAM"."SALES" WITH ROWID, SEQUENCE ("CHANNEL_ID
","AMOUNT_SOLD")  INCLUDING NEW VALUES

ALTER MATERIALIZED VIEW LOG FORCE ON "ADAM"."SALES" ADD ROWID, SEQUENCE ("CHANNE
L_ID","AMOUNT_SOLD")  INCLUDING NEW VALUES

CREATE MATERIALIZED VIEW ADAM.MV1   REFRESH FAST WITH ROWID DISABLE QUERY REWRIT
E AS SELECT ADAM.SALES.CHANNEL_ID C1, SUM("ADAM"."SALES"."AMOUNT_SOLD") M1, COUN
T("ADAM"."SALES"."AMOUNT_SOLD") M2, COUNT(*) M3 FROM ADAM.SALES GROUP BY ADAM.SA
LES.CHANNEL_ID

STATEMENT
--------------------------------------------------------------------------------
DROP MATERIALIZED VIEW ADAM.MV1

SQL> exec dbms_advisor.delete_task(:t)
PL/SQL procedure successfully completed.

Usually, we need Materialized View Logs on the base tables and special expressions in the materialized view query to be able to do a fast refresh.

I simply take the coding from above now:

SQL> CREATE MATERIALIZED VIEW LOG ON "ADAM"."SALES"
WITH ROWID, SEQUENCE ("CHANNEL_ID","AMOUNT_SOLD")
INCLUDING NEW VALUES;  2    3

Materialized view log created.

SQL> CREATE MATERIALIZED VIEW ADAM.MV1
 REFRESH FAST WITH ROWID
 enable QUERY REWRITE AS
 SELECT ADAM.SALES.CHANNEL_ID C1,
 SUM("ADAM"."SALES"."AMOUNT_SOLD") M1,
 COUNT("ADAM"."SALES"."AMOUNT_SOLD") M2,
 COUNT(*) M3
 FROM ADAM.SALES
 GROUP BY ADAM.SALES.CHANNEL_ID;

Materialized view created.

After again modifying the base table sales, I will then try a fast refresh of the MV1:

SQL> set timing off
SQL> update sales set amount_sold=amount_sold*1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.

SQL> select mview_name,staleness from user_mviews;

MVIEW_NAME                     STALENESS
------------------------------ -------------------
MV1                            NEEDS_COMPILE

SQL> set timing on
SQL> exec dbms_mview.refresh('MV1','F')

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.62

It worked! But do I always have to refresh manually? No, we can also schedule a regular refresh like this:

SQL>  alter materialized view mv1 refresh start with sysdate next sysdate + interval '1' minute;

Materialized view altered.

SQL> alter session set nls_date_format='hh24:mi:ss';

Session altered.

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            FAST     10:40:05

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            FAST     10:41:04

Of course, a one minute refresh interval is extraordinary short. An other possibility to refresh the MVs very comfortably is on commit:

CREATE MATERIALIZED VIEW ADAM.MV1
 REFRESH FAST on commit 
 WITH ROWID
 enable QUERY REWRITE AS
 SELECT ADAM.SALES.CHANNEL_ID C1,
 SUM("ADAM"."SALES"."AMOUNT_SOLD") M1,
 COUNT("ADAM"."SALES"."AMOUNT_SOLD") M2,
 COUNT(*) M3
 FROM ADAM.SALES
 GROUP BY ADAM.SALES.CHANNEL_ID;

Materialized view created.

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            COMPLETE 11:16:28

SQL> update sales set amount_sold=amount_sold*1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.

SQL> select MVIEW_NAME,LAST_REFRESH_TYPE,LAST_REFRESH_DATE from user_mviews;

MVIEW_NAME                     LAST_REF LAST_REF
------------------------------ -------- --------
MV1                            FAST     11:19:56

But this may (although done in asynchronous fashion) slow down the DML on the base table, so it requires testing whether the slow down is noticeable or neglectable. So far  for now about Materialized Views. There is of course much more to say, but I think that this is enough for a brief introduction of that theme.

Written by Uwe Hesse

July 8, 2009 at 18:21

Posted in TOI

Tagged with

parallel query & physical reads direct

without comments

After an intermezzo in Munich that left me with no time for posting, I am now in Bucharest (Romania) for an 11g Data Warehouse Administration course. The students are all Oracle Internals, which is almost a guarantee for fun and a nice atmosphere. The most difficult thing about this course was to get into the Oracle building in Bucharest at all – the security staff is very sharp here…

To get technical: Did you know that if you are doing a parallel query, it is processed with direct reads under circumvention of the Database Buffer Cache? Here is how you can investigate this:

sqlplus sh/sh
SQL> select /*+ parallel (sales,2) */ sum(amount_sold) from sales;
SQL> select name,sum(value)
     from v$sesstat
     natural join v$statname
     natural join v$session
     where username='SH'
     and name like '%physical reads%'
     group by name;

Written by Uwe Hesse

April 6, 2009 at 15:10

Posted in TOI

Tagged with