2023.5.1完成这个为期接近3个月的DB实验,收获还是很多的。今日,对lab2-B+树实现细节回顾,**请不要将实现代码公开,尊重 Andy 和 TAs 的劳动成果!**需要代码的可以联系我~

image-20240907224507776

B+树页面结构

在Lab1中,我们说到page的数据结构

1
2
3
4
5
6
7
8
9
10
/** The actual data that is stored within a page. */
char data_[BUSTUB_PAGE_SIZE]{};
/** The ID of this page. */
page_id_t page_id_ = INVALID_PAGE_ID;
/** The pin count of this page. */
int pin_count_ = 0;
/** True if the page is dirty, i.e. it is different from its corresponding page on disk. */
bool is_dirty_ = false;
/** Page latch. */
ReaderWriterLatch rwlatch_;

图片取自:做个数据库:2022 CMU15-445 Project2 B+Tree Index

image-20240907224835348

BPlusTreePage是另外BPlusTreeLeafPage和BPlusTreeInternalPage的父类

BPlusTreePage

格式

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* Both internal and leaf page are inherited from this page.
*
* It actually serves as a header part for each B+ tree page and
* contains information shared by both leaf page and internal page.
*
* Header format (size in byte, 24 bytes in total):
* ----------------------------------------------------------------------------
* | PageType (4) | LSN (4) | CurrentSize (4) | MaxSize (4) |
* ----------------------------------------------------------------------------
* | ParentPageId (4) | PageId(4) |
* ----------------------------------------------------------------------------
*/

数据结构

1
2
3
4
5
6
7
// member variable, attributes that both internal and leaf page share
IndexPageType page_type_ __attribute__((__unused__));
lsn_t lsn_ __attribute__((__unused__));
int size_ __attribute__((__unused__));
int max_size_ __attribute__((__unused__));
page_id_t parent_page_id_ __attribute__((__unused__));
page_id_t page_id_ __attribute__((__unused__));

图片取自:做个数据库:2022 CMU15-445 Project2 B+Tree Index

image-20240907225211435

B_PLUS_TREE_INTERNAL_PAGE

格式

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* Store n indexed keys and n+1 child pointers (page_id) within internal page.
* Pointer PAGE_ID(i) points to a subtree in which all keys K satisfy:
* K(i) <= K < K(i+1).
* NOTE: since the number of keys does not equal to number of child pointers,
* the first key always remains invalid. That is to say, any search/lookup
* should ignore the first key.
*
* Internal page format (keys are stored in increasing order):
* --------------------------------------------------------------------------
* | HEADER | KEY(1)+PAGE_ID(1) | KEY(2)+PAGE_ID(2) | ... | KEY(n)+PAGE_ID(n) |
* --------------------------------------------------------------------------
*/

数据结构

1
2
#define MappingType std::pair<KeyType, ValueType>
MappingType array_[1];

图片取自:做个数据库:2022 CMU15-445 Project2 B+Tree Index

internal page的首个位置没有key

image-20240907225414390

BPlusTreeLeafPage

leaf page 和 internal page 的内存布局基本一样,只是 leaf page 多了一个成员变量 next_page_id,指向下一个 leaf page(用于 range scan)。因此 leaf page 的 header 大小为 28 Byte。

格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

/**
* Store indexed key and record id(record id = page id combined with slot id,
* see include/common/rid.h for detailed implementation) together within leaf
* page. Only support unique key.
*
* Leaf page format (keys are stored in order):
* ----------------------------------------------------------------------
* | HEADER | KEY(1) + RID(1) | KEY(2) + RID(2) | ... | KEY(n) + RID(n)
* ----------------------------------------------------------------------
*
* Header format (size in byte, 28 bytes in total):
* ---------------------------------------------------------------------
* | PageType (4) | LSN (4) | CurrentSize (4) | MaxSize (4) |
* ---------------------------------------------------------------------
* -----------------------------------------------
* | ParentPageId (4) | PageId (4) | NextPageId (4)
* -----------------------------------------------
*/

数据结构

1
2
3
page_id_t next_page_id_;
// Flexible array member for page data.
MappingType array_[1];

每一个页面的key和value数量

树上操作

搜索