B-Tree

B-Tree는 디스크나 직접 접근 보조 기억 장치에서 잘 동작하도록 설계된 균형이 잡힌 검색 트리이다.

많은 데이터베이스 시스템에서 정보를 저장하기 위해 B-Tree나 B-Tree 변형을 사용한다.

(따라서 B-Tree 성능을 평가할 때, 자기디스크와 메인 랜덤 접근 메모리(Main Random Access Memory가 서로 다르다.)

B-Tree의 자식노드 수(분기인자; branching factor)는 디스크 장치의 특성에 따라 결정된다.

B-Tree height: O(logbn) $O(\log_bn)$ branching factor b $b$

메인 메모리에 데이터가 없어 디스크를 읽는 경우

DiskRead(x) 연산 수행

메인 메모리에서 변경한 내용을 디스크에 업로듸

DiskWrite(x) 연산 수행

컴퓨터 수행시간을 크게 아래와 같이 나누어서 생각할 수 있다.

- 디스크 접근 횟수(page 수에 따라 변함)

- CPU(계산, 연산) 시간

일반적으로 디스크 접근 횟수를 최소화하기 위해 B-Tree 노드의 크기는 디스크 페이지 전체 만큼 크다.

B-Tree의 각 노드는 일반적으로 부수적인 정보를 담고 있다.

B+-Tree는 부수적인 정보를 리프노드가 가지게 함으로서 더욱 많은 노드를 할당할 수 있게 한다.

struct Node{

int n=0, t;

int* key;

bool leaf;

Node** cNs; // child Nodes

Node(int _t=4, bool _leaf=false): t(_t), leaf(_leaf){

if(leaf) cNs=nullptr;

else{

cNs=new Node*[2*t];

}

key=new int[2*t-1];

}

~Node(){

if(!leaf){

//for(int i=0; i<2*t; ++i) delete this->cNs[i];

delete[] this->cNs;

delete this->key;

}

void printKey(void){

for(int i=0; i<this->n; ++i){

std::cout<<key[i]<<' ';

}

std::cout<<std::endl;

}

};

노드가 가질 수 있는 키의 개수에는 상한과 하한이 존재한다.

최소 차수(minimum degree); t\geq 2

가장 단순히 t=2인 경우, 내부 노드가 2, 3 or 4개의 자식 노드를 가지며

이를 2-3-4 트리라고 부른다.

Search & Create & SplitChild & Insert

#include <iostream>

struct Node{

int n=0, t;

int* key;

bool leaf;

Node** cNs; // child Nodes

Node(int _t=4, bool _leaf=false): t(_t), leaf(_leaf){

if(leaf) cNs=nullptr;

else{

cNs=new Node*[2*t];

}

key=new int[2*t-1];

}

~Node(){

if(!leaf){

//for(int i=0; i<2*t; ++i) delete this->cNs[i];

delete[] this->cNs;

delete this->key;

}

void printKey(void){

for(int i=0; i<this->n; ++i){

std::cout<<key[i]<<' ';

}

std::cout<<std::endl;

}

};

struct BTree{

Node* root;

int t;

BTree(int _t): t(_t){}

void Create(void){

Node* x=new Node(this->t, true);

// DISK_WRITE

this->root=x;

}

Node* Search(Node* x, int k){

int i=0;

while(i<x->n && k>x->key[i]) i++;

if(i<x->n && k==x->key[i]) return x;

else if(x->leaf) return nullptr;

else{

//DISK_READ(x->cNs[i]);

return Search(x->cNs[i], k);

}

void SplitChild(Node* x, int i){

Node* y=x->cNs[i];

Node* z1=new Node(this->t, y->leaf);

Node* z2=new Node(this->t, y->leaf);

z1->n=this->t-1;

z2->n=this->t-1;

for(int j=0; j<this->t-1; ++j){

z1->key[j]=y->key[j];

z2->key[j]=y->key[j+this->t];

}

if(!(y->leaf)){

for(int j=0; j<this->t; ++j){

z1->cNs[j]=y->cNs[j];

z2->cNs[j]=y->cNs[j+this->t];

}

for(int j=x->n; j>=i+1; --j){

x->cNs[j+1]=x->cNs[j];

}

x->cNs[i]=z1;

x->cNs[i+1]=z2;

for(int j=x->n-1; j>=i; --j){

x->key[j+1]=x->key[j];

}

x->key[i]=y->key[this->t-1];

x->n++;

// DISK_WRITE(y)

// DISK_WRITE(z);

// DISK_WRITE(X);

delete y;

}

void InsertNonFull(Node* x, int k){

int i=x->n-1;

if(x->leaf){

while(i>=0 && k<x->key[i]){

x->key[i+1]=x->key[i];

i--;

}

x->key[i+1]=k;

x->n++;

// DISK_WRITE(x);

} else {

while(i>=0 && k<x->key[i]) i--;

i++;

// DISK_READ(x->cNs[i]);

if(x->cNs[i]->n==this->t*2-1){

SplitChild(x, i);

if(k>x->key[i]) i++;

}

InsertNonFull(x->cNs[i], k);

}

void Insert(int k){

Node* r=this->root;

if(r->n==this->t*2-1){

Node* s=new Node(this->t, false);

this->root=s;

s->cNs[0]=r;

SplitChild(s, 0);

InsertNonFull(s, k);

} else {

InsertNonFull(r, k);

}

};

int main(void){

BTree* btree=new BTree(2);

int alpha[]={4, 19, 17, 11, 3, 12, 8, 20, 22, 23, 13, 18, 14, 16, 1, 2, 24, 25, 4, 26, 5};

btree->Create();

for(int i=0; i<sizeof(alpha)/sizeof(int); ++i){

btree->Insert(alpha[i]);

Node* ret=btree->Search(btree->root, alpha[i]);

btree->root->printKey();

ret->printKey();

std::cout<<std::endl;

}

return 0;

}

Delete

1. 키 k가 노드 x에 있고 x가 리프 노드면, 키 k를 x에서 삭제한다.

2. 키 k가 노드 x에 있고 x가 내부 노드인 경우

3. 키가 내부 노드에 없고 리프에 있는 경우, 존재하는 루트의 x.c_i를 결정한다.

내부 노드가 최소 차수개의 노드를 가지도록 하기 위한 작업 필요