terça-feira, 27 de novembro de 2018

Oracle RAC: Using a Second NIC for Interconnect HA

Introduction


One of the most important feature of Oracle RAC is High Availability, more resilients components you have better is your Clusterware HA Score. For Oracle Clusterware the Interconnect Network Plays a big rule in your enviroment, let's supose you lost connection between your private network Oracle will choose  some nodes to be evicted. For that  reason we should look a little foward on this.

To ensure resilient over network we use Link Aggregation, this could be implement over a variety of hardware components such NICs and Network Switchs. We can use some OS Techinics to implement link aggregation like bounding, and the system administrator is the responsable to garantee resilient over this network. There  is no problem on this approach, actually a large number of environment use it, but, Oracle has it's approach too.

Oracle High Availability IP


From Oracle 11.2.0.2 we can use HAIP (High Availability IP) instead the OS method. For this we have to configure a second network card over a different subnet from the first interconnect, that is for garantee HA, keep in mind that you should use the same MTU and have the network interface name over all nodes.

You can have up to four active device for HAIP but you can configure more, but Oracle will use only four, in case you lost one Oracle will choose another configured device to replace the lost one. Still in case of a failure of a single device you will not soffer for bounces or disconnects Oracle will available over all nodes.


















Checkin the current environment:


We can use oifcfg getif to verify all network used by Oracle Clusterware, the main ideia is to add the 192.168.2.0 subnet as a second network in a link aggregation.

[oracle@srv-ora-rac01 ~]$ oifcfg getif

enp0s3  192.168.1.0  global  public

enp0s8  192.168.0.0  global  cluster_interconnect

[oracle@srv-ora-rac01 ~]$ 

Check if the IP is reachable in both nodes

[root@srv-ora-rac01 ~]# ping srv-ora-rac01-priv2
PING srv-ora-rac01-priv2 (192.168.2.74) 56(84) bytes of data.
64 bytes from srv-ora-rac01-priv2 (192.168.2.74): icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from srv-ora-rac01-priv2 (192.168.2.74): icmp_seq=2 ttl=64 time=0.039 ms
^C
--- srv-ora-rac01-priv2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1053ms
rtt min/avg/max/mdev = 0.039/0.044/0.049/0.005 ms

[root@srv-ora-rac01 ~]# ping srv-ora-rac02-priv2
PING srv-ora-rac02-priv2 (192.168.2.75) 56(84) bytes of data.
64 bytes from srv-ora-rac02-priv2 (192.168.2.75): icmp_seq=1 ttl=64 time=1.98 ms
64 bytes from srv-ora-rac02-priv2 (192.168.2.75): icmp_seq=2 ttl=64 time=0.253 ms
^C
--- srv-ora-rac02-priv2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.253/1.119/1.985/0.866 ms
[root@srv-ora-rac01 ~]# 


Configuring HAIP


Ok now we can configure a second network at our interconnect, of this we only have to

[root@srv-ora-rac01 ~]# oifcfg setif -global enp0s9/192.168.2.0:cluster_interconnect
[root@srv-ora-rac01 ~]# 

Now we can check the Clusterware Network and validate if we have now thwo interconnect networks:

[root@srv-ora-rac01 ~]# oifcfg getif
enp0s3  192.168.1.0  global  public
enp0s8  192.168.0.0  global  cluster_interconnect
enp0s9  192.168.2.0  global  cluster_interconnect
[root@srv-ora-rac01 ~]# 

From this we already have Interconnect capable to failover but we need to restart CRS in all nodes to make full use of HAIP.


POC - Fail of one Network interface


With HAIP configured we can deal with the failure  of interconnect communication, I'll simulate this bringing down one interface and the clusterware need to stay up and running:

[root@srv-ora-rac01 ~]# oifcfg getif
enp0s3  192.168.1.0  global  public
enp0s8  192.168.0.0  global  cluster_interconnect
enp0s9  192.168.2.0  global  cluster_interconnect

[root@srv-ora-rac01 ~]# crsctl check cluster -all
**************************************************************
srv-ora-rac01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
srv-ora-rac02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[root@srv-ora-rac01 ~]# 

[root@srv-ora-rac01 ~]# ifconfig -a enp0s9
enp0s9: flags=4163  mtu 1500
        inet 192.168.2.74  netmask 255.255.255.0  broadcast 192.168.2.255
        inet6 fe80::117f:3ef7:eb06:9c26  prefixlen 64  scopeid 0x20
        ether 08:00:27:e6:c9:32  txqueuelen 1000  (Ethernet)
        RX packets 4120  bytes 2929493 (2.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4645  bytes 3708582 (3.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@srv-ora-rac01 ~]# 

[root@srv-ora-rac01 ~]# ifdown enp0s9
Device 'enp0s9' successfully disconnected.
[root@srv-ora-rac01 ~]# ifconfig -a enp0s9
enp0s9: flags=4163  mtu 1500
        ether 08:00:27:e6:c9:32  txqueuelen 1000  (Ethernet)
        RX packets 5070  bytes 3622826 (3.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5458  bytes 4208014 (4.0 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@srv-ora-rac01 ~]# 


[root@srv-ora-rac01 ~]# oifcfg getif
enp0s3  192.168.1.0  global  public
enp0s8  192.168.0.0  global  cluster_interconnect
enp0s9  192.168.2.0  global  cluster_interconnect
[root@srv-ora-rac01 ~]# crsctl check cluster -all

**************************************************************
srv-ora-rac01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
srv-ora-rac02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

Ok, nothing happen to node, all services remain available as expected.

Hope you enjoy!

Diogo 


sexta-feira, 23 de novembro de 2018

Oracle RAC: Migrate SCAN from /etc/hosts to DNS

Introdution


Oracle SCAN Listener and SCAN name resolution is not a new feature in Oracle World, It was introduced in Oracle 11g R2 and since then is one of the most important feature for high availability service. With Scan name we can resolve one single name for the entire cluster and we don't need to wory in mainten a long list of available VIPs for every single application client.

One requisite for use Scan-name is to have 3 IPs resolving one single name in Round Robin at your DNS Service, as we can confirm at this Oracle White Paper, then we can deal with load balance and high availability property.

Problem


Unfurtenly there is a workaround for the three IPs recomendation, you can use the /etc/hosts with one single IP, install Oracle Clusterware and work with just one IP for the Scan, and in a near future another DBA has the duty to change this, so the DBA who installed the Clusterware doesn't need to wait the DNS guy do his job to install Oracle.

The problem is, oneday you will need to fix this, I already faced few environment running like this, even in 2018 I faced this. In this article I'll show how to fix and what you need to care and plan before executing.

Solution


It's very simple to change from /etc/hosts to DNS, you need first to ensure that the DNS Guy input tree entries on the DNS resolving one single name with 3 ip in round robin as we can see bellow:

[root@srv-ora-rac01 ~]# nslookup scan-ora-rac
Server: 192.168.1.101
Address: 192.168.1.101#53

Name: scan-ora-rac
Address: 192.168.1.72
Name: scan-ora-rac
Address: 192.168.1.73
Name: scan-ora-rac
Address: 192.168.1.71

[root@srv-ora-rac01 ~]# nslookup scan-ora-rac
Server: 192.168.1.101
Address: 192.168.1.101#53

Name: scan-ora-rac
Address: 192.168.1.73
Name: scan-ora-rac
Address: 192.168.1.71
Name: scan-ora-rac
Address: 192.168.1.72

[root@srv-ora-rac01 ~]# nslookup scan-ora-rac
Server: 192.168.1.101
Address: 192.168.1.101#53

Name: scan-ora-rac
Address: 192.168.1.71
Name: scan-ora-rac
Address: 192.168.1.72
Name: scan-ora-rac
Address: 192.168.1.73

[root@srv-ora-rac01 ~]# 


The main problem of this is that we still doesn't have all scan listener up and running, so in the mean time between the DNS configuration and the listener configuration user can have time-out, so I recomend to do this out of bussiness hour.

[root@srv-ora-rac01 ~]# ping scan-ora-rac
PING scan-ora-rac (192.168.1.72) 56(84) bytes of data.
^C
--- scan-ora-rac ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2033ms

[root@srv-ora-rac01 ~]# 



First lets confirm the status and the number of scans

[root@srv-ora-rac01 ~]# srvctl config scan
SCAN name: scan-ora-rac, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/enp0s3, static
Subnet IPv6: 
SCAN 0 IPv4 VIP: 192.168.1.71
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
[root@srv-ora-rac01 ~]# srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node srv-ora-rac01
[root@srv-ora-rac01 ~]# 


So before continue we need to comment the scan name resolution from the /etc/hosts

[root@srv-ora-rac01 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

##################
# ORACLE RAC IP  #
##################

# SCAN
# 192.168.1.71    scan-ora-rac

# PUBLIC NETWORK
192.168.1.74    srv-ora-rac01
192.168.1.75    srv-ora-rac02

# VIP
192.168.1.76    srv-ora-rac01-vip
192.168.1.77    srv-ora-rac02-vip

# PRIVATE
192.168.0.74    srv-ora-rac01-priv
192.168.0.75    srv-ora-rac02-priv

[root@srv-ora-rac01 ~]# 


So at this point we need to stop the listener and the scan

[root@srv-ora-rac01 ~]# srvctl stop scan_listener
[root@srv-ora-rac01 ~]# srvctl stop scan
[root@srv-ora-rac01 ~]# srvctl status scan_listener
SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is not running
[root@srv-ora-rac01 ~]# srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running
[root@srv-ora-rac01 ~]# 


After the listener stoped we can modify the number of scan listener and scan, for the modify Oracle will query the DNS and resolve the name and create the number of listener according to the number of ip resolved.

[root@srv-ora-rac01 ~]# srvctl modify scan -n scan-ora-rac
[root@srv-ora-rac01 ~]# srvctl modify scan_listener -u

We can check the result as:

[root@srv-ora-rac01 ~]# srvctl config scan
SCAN name: scan-ora-rac, Network: 1
Subnet IPv4: 192.168.1.0/255.255.255.0/enp0s3, static
Subnet IPv6: 
SCAN 0 IPv4 VIP: 192.168.1.73
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 1 IPv4 VIP: 192.168.1.72
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
SCAN 2 IPv4 VIP: 192.168.1.71
SCAN VIP is enabled.
SCAN VIP is individually enabled on nodes: 
SCAN VIP is individually disabled on nodes: 
[root@srv-ora-rac01 ~]# 

[root@srv-ora-rac01 ~]# srvctl start scan
[root@srv-ora-rac01 ~]# srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node srv-ora-rac02
SCAN VIP scan2 is enabled
SCAN VIP scan2 is running on node srv-ora-rac01
SCAN VIP scan3 is enabled
SCAN VIP scan3 is running on node srv-ora-rac01
[root@srv-ora-rac01 ~]# 


That's all, hope you enjoy!


Other Examples


quinta-feira, 9 de agosto de 2018

Seja Proativo - TFA (Trace File Analyzer)

A lei de Murphy diz que se algo pode dar errado, dará, e dentro de um ambiente com Oracle, principalmente Clusterware isso é mais do que verdade, uma hora ou outra irá acontecer algum problema, mesmo tendo gasto aquelas horas a mais configurando Bound ou Normal Redudancy no ASM, irá ocorrer problema, o grande "X" da questão é, esteja preparado!

O TFA é uma ferramenta que prove monitoração de healcheck em tempo real, detecção falhas e diagnostico numa unica ferramenta de linha de comando, o TFA pode ser utilizado para monitorar erros em arquivos de log, coletar estatisticas para diagnostico e fazer upload de arquivos  de maneira automatica.

Porém o TFA, apesar de em algumas versões já vir instalado ele faz muito menos do que se espera, monitorar a nota Oracle MOS Doc ID 1513912.1 para instalar sempre a versão mais atualizada.

 Instalando o TFA


Para obter a versão mais recente do TFA monitorar MOS Doc ID 1513912.1 e realizar o download através do Link, durante esse post a versão atual é a 18.3, e é suportada em diversas plataformas como Redhat, Suse, HPUX, AIX, Windows e Solaris.

As instalação podem ser:
  • Como um Daemon, portanto necessário instalar como root;
  • Non-Daemon, podendo ser instalado como usuário non-root;

Podendo Estar em
  • Local Node - Somente irá coletar no node local
  • Cluster, se instalado para um ambiente com cluster, é recomendado que seja feito como user equivalency para o root.
A instalação é bem simples, basta realizar o download, extrair o zip file e iniciar o instalador, ./installTFA-LINUX , o qual irá fazer algumas poucas perguntas e logo a instalação esta completa.


[root@ol7-ora12-srvprd1 sf_hostDownloads]# ./installTFA-LINUX
TFA Installation Log will be written to File : /tmp/tfa_install_4358_2018_08_09-18_45_46.log

Starting TFA installation

TFA Version: 183000 Build Date: 201808081359

Enter a location for installing TFA (/tfa will be appended if not supplied) [/media/sf_hostDownloads/tfa]:
/u01/app/oracle/tfa

Running Auto Setup for TFA as user root...

Would you like to do a [L]ocal only or [C]lusterwide installation ? [L|l|C|c] [C] : L
Installing TFA now...

Discovering Nodes and Oracle resources

Starting Discovery...


Getting list of nodes in cluster . . . . .

....


 Installing TFA on ol7-ora12-srvprd1:
HOST: ol7-ora12-srvprd1    TFA_HOME: /u01/app/oracle/tfa/ol7-ora12-srvprd1/tfa_home

.-------------------------------------------------------------------------------------.
| Host              | Status of TFA | PID  | Port | Version    | Build ID             |
+-------------------+---------------+------+------+------------+----------------------+
| ol7-ora12-srvprd1 | RUNNING       | 4906 | 5000 | 18.3.0.0.0 | 18300020180808135947 |
'-------------------+---------------+------+------+------------+----------------------'

Running Inventory in All Nodes...

Enabling Access for Non-root Users on ol7-ora12-srvprd1...

Adding default users to TFA Access list...

Summary of TFA Installation:
.----------------------------------------------------------------------.
|                           ol7-ora12-srvprd1                          |
+---------------------+------------------------------------------------+
| Parameter           | Value                                          |
+---------------------+------------------------------------------------+
| Install location    | /u01/app/oracle/tfa/ol7-ora12-srvprd1/tfa_home |
| Repository location | /u01/app/oracle/tfa/repository                 |
| Repository usage    | 0 MB out of 10240 MB                           |
'---------------------+------------------------------------------------'

TFA is successfully installed...

Usage : /u01/app/12.1.0/grid/bin/tfactl [options]
    commands:diagcollect|collection|analyze|ips|run|start|stop|enable|disable|status|print|access|purge|directory|host|receiver|set|toolstatus|uninstall|diagnosetfa|syncnodes|setupmos|upload|availability|rest|events|search|changes|isa
For detailed help on each command use:
  /u01/app/12.1.0/grid/bin/tfactl -help

Usando o TFA


O TFA pode ser utilizado com três tipos de interfaces

  • Command line - tfactl command
  • Interativo - tfactl 
  • Menu - tfactl menu 
 O Menu facilita muito a vida quando vamos configurar e se não temos certeza da sintaxe dos comandos, porém o command line para rotinas em lotes com certeza é a melhor opção.


habilitando acesso a usuários


Por padrão os usuários owner do ORACLE_HOME e do GRID_HOME já possuem permissão para utilizar o tfactl, porém para adicionar outros usários basta usar o comando:

tfactl access enable
tfactl access add user username

Exemplo:

[root@ol7-ora12-srvprd1 bin]# ./tfactl access lsusers
.---------------------------------.
|  TFA Users in ol7-ora12-srvprd1 |
+-----------+-----------+---------+
| User Name | User Type | Status  |
+-----------+-----------+---------+
| oracle    | USER      | Allowed |
'-----------+-----------+---------'

[root@ol7-ora12-srvprd1 bin]# ./tfactl access enable

TFA has already enabled access for Non-root Users.

[root@ol7-ora12-srvprd1 bin]# ./tfactl access add -user dhnomura -local

Successfully added 'dhnomura' to TFA Access list.

.---------------------------------.
|  TFA Users in ol7-ora12-srvprd1 |
+-----------+-----------+---------+
| User Name | User Type | Status  |
+-----------+-----------+---------+
| dhnomura  | USER      | Allowed |
| oracle    | USER      | Allowed |
'-----------+-----------+---------'

[root@ol7-ora12-srvprd1 bin]#




Administrando o Daemon


É possivel também administrar o daemon como, start, stop, enable ou disable auto-start, para isso:

[root@ol7-ora12-srvprd1 bin]# ./tfactl stop
Stopping TFA from the Command Line
Stopped OSWatcher
TFA-00518 Oracle Trace File Analyzer (TFA) is not running (stopped)
TFA Stopped Successfully
. . .
Successfully stopped TFA..
[root@ol7-ora12-srvprd1 bin]# ./tfactl start
Starting TFA..
. . . . .
Successfully started TFA Process..
. . . . .
TFA Started and listening for commands
[root@ol7-ora12-srvprd1 bin]# ./tfactl status

.--------------------------------------------------------------------------------------------------------.
| Host              | Status of TFA | PID  | Port | Version    | Build ID             | Inventory Status |
+-------------------+---------------+------+------+------------+----------------------+------------------+
| ol7-ora12-srvprd1 | RUNNING       | 8629 | 5000 | 18.3.0.0.0 | 18300020180808135947 | COMPLETE         |
'-------------------+---------------+------+------+------------+----------------------+------------------'

[root@ol7-ora12-srvprd1 bin]# ./tfactl disable
TFA autostart has been disabled ..
[root@ol7-ora12-srvprd1 bin]# ./tfactl enable
TFA autostart has been enabled ..
[root@ol7-ora12-srvprd1 bin]#


Administrando o respositorio


O TFA armazena tudo que é coletado em um repositorio no sistema operacional, esse repositorio pode ser limpo conforme algumas politicas definidas, ou em caso de se chegar ao seu limite, alguns exemplos são:

[root@ol7-ora12-srvprd1 bin]# ./tfactl set repositorydir=/u01/app/oracle/tfa2
Successfully changed repository
.------------------------------------------------------------.
| Repository Parameter      | Value                          |
+---------------------------+--------------------------------+
| Old Location              | /u01/app/oracle/tfa/repository |
| New Location              | /u01/app/oracle/tfa2           |
| Current Maximum Size (MB) | 10240                          |
| Current Size (MB)         | 0                              |
| Status                    | OPEN                           |
'---------------------------+--------------------------------'
[root@ol7-ora12-srvprd1 bin]# ./tfactl set minagetopurge=48
[root@ol7-ora12-srvprd1 bin]# ./tfactl set autopurge=ON


Toda a configuração do TFA pode ser encontrada em:
Managing and Configuring Oracle Trace File Analyzer

Coleta automatica de eventos


O TFA observa erros como ORA-600 e node Eviction, quando isso ocorre ele pode:

  • Invocar ferramentas de diagnostico e coleta de logs.
  • Reduzir o tamanho dos logs apenas contendo as informações necessárias.
  • Consolidar e compactar a coleta de logs entre todos os nodes.
  • Armazenar os logs em um repositório.
  • Enviar email.
  • Fazer upload para Oracle.

Para realizar a coleta automatica basta executar:

[root@ol7-ora12-srvprd1 bin]# ./tfactl set autodiagcollect=ON
Successfully set autodiagcollect=ON



Oracle Trace File Analyzer Workflow
Oracle Trace File Analyzer Workflow




Coletando Informações de diagnostico manualmente


Há diversas formas de coletas, podemos filtrar por erros especificos, definir uma janela de tempo especifico, filtrar por componente ou em tempo real, a domuentação é enorme, vou colocar alguns exemplos iguais ao site da Oracle, porém quando solicitado por um analista do Suporte esse irá especificar examente o que deve ser coletado e é bom ter, pois não havendo todos os logs o troubleshot pode não ser dos melhores.

Health Check em tempo real

tfactl summary
Todos eventos nas ultimas 18 horas

tfactl analyze –last 18h

Todos os erros ora600 nas ultimas 8 horas

tfactl analyze -search “ora-00600" -last 8h

Tools


Além das ferramentas de coleta e health check que o TFA possui a há outras ferramentas que foram incluidas no TFA e podem ser utilizadas em conjunto, algumas delas são o orachk, oratop e oswbb, todas muito util para diagnostico e troubleshoot, acredito que cada uma mereça um capitulo a parte, fica para um proximo post.

Para validar o status de cada uma delas:

[root@ol7-ora12-srvprd1 bin]# ./tfactl toolstatus

.------------------------------------------------------------------.
|              TOOLS STATUS - HOST : ol7-ora12-srvprd1             |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.0.1 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | dbcheck      |   18.3.0.0.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.


Referencias


MOS Doc ID 1513912.1

OTN TFA Overview
Oracle Documentation

terça-feira, 15 de maio de 2018

Oracle 12c - AWR Baseline

Introdução


Baselines são conjuntos de AWR Snapshots que representam um periodo de tempo importante na base de dados, onde geralmente esse periodo é de uma performance aceitavel. O Baseline permite que sejam feitas comparações entre periodos de tempo realizando analises que podem ser reativas ou pro-ativas realizadas atraves de thresholds.

Definições


Algumas definições antes de iniciar o assunto

Estatisticas

O Oracle armazena e utiliza diversas estatisiticas no banco de dados, elas podem ser categorizadas como:

Tipos de Estatisticas
- Object Statistics, Time Model Statistics, System, Instruções SQL, Sessão.

Estatisticas de Sistema nada mais é que um valor dentro daquele determinado instante.

Metricas

Metrica é determinada pela razão de uma estatisticas por um denominador comum, geralmente o "tempo".

Atribuindo um denominador a estatisca obtem um valor, uma informação, a qual é  possivel através de comparações analisar periodos e determinar grandes divergencias que podem apontar para possiveis problemas.

Por exemplo, quantidade de transações é um estatisticas, numero de transações por segundo é uma metrica.

AWR Snapshots

É um conjuto de estatisticas e metricas de banco de dados armazenados que reflete um periodo na base de dados.

AWR Baselines


Baseline é um grupo de snapshot que representa um periodo significativo na base de dados, é util para ser uma base de comparação entre um periodo bom e ruim, afim de diagnosticar problemas de performance comparando divergencias entre utilização da base de dados, perfil de consumo, estatisticas e metricas.

Quando utilizamos Baseline, os snapshots referente ao periodo do não são removidos através job automático de expurgo, são retidos na base de dados enquanto os baselines existirem.

Como exemplo podemos considerar o job de fechamento mensal como um periodo critico, criamos um baseline para cada job de fechamento e armazenamos o baseline por tempo indeterminado, assim quando ocorrer um problema no fechamento mensal temos o historico dos outros meses para realizar as comparações necessário e diagnosticar o motivo da queda de performance.

Tipos de Baseline

Existem 3 tipos de baselines, Fixed, Template e Moving Window
  • Fixed - Este baseline é criado manualmente e reflete um unico periodo.
  • Template - São utilizado como agendamentos para que o MMON crie o baseline no futuro, podendo ser dois tipos de template, single ou repeting.
    • Single - É criado um unico baseline com base no conjunto definido, este tipo de template é util quando sabemos que irá ocorrer uma demanda especifica na base de dados e gostariamos deixar agendando a criação do baseline para esse periodo.
    • Repeting - É criado uma cadeia de baseline que pode ser repetido durante N vezes em um periodo X, é util quando sabemos que existem workloads especificos num determinado periodo que se repete ao longo do tempo
  • Moving Window - Por padrão engloba todos os snapshot dentro da janela de retenção do AWR, ou seja Retention Window >= Moving Window Baseline. É utilizado sobre tudo para monitoração através de adaptive threshold, onde permite que se tenha uma imagem mais apurada do que se acontece na base de dados.


Criando Baseline 


Fixed


BEGIN
    DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE (
start_snap_id => 145,
                end_snap_id => 146,
baseline_name => 'Teste_Fixed_01',
                dbid => 1491358463,
        expiration => 30);
END;
/

Criando Template - Single


BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE_TEMPLATE (
start_time => to_date('15/05/2018 19:30','DD/MM/YYYY HH24:MI'),
end_time => to_date('15/05/2018 20:00','DD/MM/YYYY HH24:MI'),
baseline_name => 'UNIQ_BASELINE_TESTE01',
template_name => 'UNIQ_BASELINE_TESTE01',
expiration => NULL );
END;
/

Criando Template - Repeting


BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE_TEMPLATE (
day_of_week =>'ALL',
hour_in_day => 20,
duration => 1,
start_time => to_date('15/05/2018 20:00','DD/MM/YYYY hh24:mi'),
end_time => ADD_MONTHS(SYSDATE, 36),
baseline_name_prefix => 'BASELINE_DIARIO_FECHAMENTO',
template_name => 'BASELINE_DIARIO_FECHAMENTO',
expiration => NULL,
dbid => NULL );
END;
/

Principais Views


• DBA_HIST_BASELINE
• DBA_HIST_BASELINE_DETAILS
• DBA_HIST_BASELINE_TEMPLATE
• DBA_HIST_BASELINE_METADATA


Referencia