Data Structures Intro
Reference
September 12, 2017
Lecture day for CS50. This week it’s all about data structures, and we learned a few types:
- Linked Lists (singly-linked & doubly-linked)
- Stacks
- Queues
- Trees (an specifically, binary search trees)
- Hash tables
- Tries (aka reTRIEvals)
And here are some of the related struct
s:
Linked List
A chain of nodes which point to other nodes, held together by a start
pointer at the beginning and a NULL
termination at the end. Nodes are recursive, i.e. call other nodes within the node, until NULL
is reached. Linked lists of nodes are useful because you don’t need to know how much memory to allocate before creating the list, unlike an array of data, where if it turns out you need more memory, you have to do a lot of extra work to copy & redefine the array. With linked lists you can expand more freely.
Linked lists can be singly-linked so that the nodes all call next
in the same direction, or they can be doubly-linked so that each node points both forwards and backwards.
Definition of a singly-linked node struct:
typedef struct node { |
Definition of a doubly-linked node struct:
typedef struct node { |
To search a linked list, go through each node (first the int n
then next
to the next node) until you find the value you’re looking for:
bool search(int n, node *list) |
Stacks
Data is stored vertically so that you can only retrieve the last node added to the stack (LIFO: last in, first out). Stacks can be altered with push
(add data) or pop
(remove data). Top
should be defined as a global variable so that it always lets you reference the last node added to the stack. Definition of a stack struct:
typedef struct { |
Queues
Data is stored in a line and you always retrieve data from the front of the line (FIFO: first in, first out). Queues can be altered when you enqueue
a node at the end of the line or dequeue
a node from the front of the line. A front
variable is maintained to store the current location of the front of the queue. Size
also needs to be kept track of. Definition of a queue struct:
typedef struct { |
Trees
Data is stored cascading from a root node
in children
; any node without children are leaves
. In a binary search tree each node as 0
, 1
, or 2
children; the left child is always smaller than its parent and the right child is always bigger. This way you can easily throw out whole branches of data as you traverse the tree. It’s important for the tree to be balanced (i.e. not to heavy on one side or another) to ensure big O run time optimization. Definitions of a tree struct:
typedef struct { |
For compression:
typedef struct node { |
To search a binary tree:
bool search(int n, node *tree) |
Hash Tables
With good design hash tables can have more efficiency than the above methods in terms of running time. Using a hash table is like splitting the data into buckets in a consistent manner, so that you have to search smaller collections of data when you’re looking for something. Each bucket is a linked list of the inner data.
Tries
An even more efficient style of hash table to reduce the running time greatly. With this recursive system, each bucket contains the same buckets as the parent level, and data are stored by traversing bucket by bucket until reaching an end
marker. For example to store words, each bucket on the parent level would represent a letter A - Z
, and then in each of those buckets are additional buckets A - Z
. To find a word you follow the buckets one inside the next for each letter, until you determine the end of the word.