Tutorial#

This tutorial is intended as an introduction to working with PetDB.

Getting a Database#

The first step when working with petdb is to create a PetDB instance. Doing so is easy:

>>> from petdb import PetDB
>>> db = PetDB.get()

The above code will place data folder on the current path. You can also specify the path explicitly, as follows:

>>> db = PetDB.get(os.path.join("persistent", "data"))

Getting a Collection#

A collection is a group of documents stored in PetDB, and can be thought of as roughly the equivalent of a table in a relational database. A single instance of PetDB can support multiple independent collections. When working with petdb you access collections using attribute style access on PetDB instances:

>>> col = db.test_collection

If your collection name is such that using attribute style access won’t work (like test-collection), you can use dictionary style access instead:

>>> col = db["test-collection"]

PetDB fully supports type hinting, but modern IDEs don’t recognize types with attribute style access, so we recommend to use dictionary style or special method collection():

>>> col = db.collection("test-collection")

Documents#

Data in PetDB is represented (and stored) using JSON-style documents. In PetDB we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

>>> import datetime
>>> post = {
...     "author": "Mike",
...     "text": "My first blog post!",
...     "tags": ["petdb", "python"],
...     "date": datetime.datetime.now().timestamp(),
... }

Inserting a Document#

To insert a document into a collection we can use the insert() method:

>>> posts = db.posts
>>> inserted = posts.insert(post)
>>> inserted
{'author': 'Mike', 'text': 'My first blog post!', 'tags': ['petdb', 'python'],
'date': 1707761881.616323, '_id': '09727c8d-c188-4b7e-993d-e4107034315c'}

When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection. insert() returns the inserted document.

Getting a Single Document With find()#

The most basic type of query that can be performed in PetDB is find(). This method returns a single document matching a query (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match. Here we use find() to get the first document from the posts collection:

>>> import pprint
>>> pprint.pprint(posts.find({}))
{'_id': '09727c8d-c188-4b7e-993d-e4107034315c',
 'author': 'Mike',
 'date': 1707761881.616323,
 'tags': ['petdb', 'python'],
 'text': 'My first blog post!'}

The result is a dictionary matching the one that we inserted previously.

Note

The returned document contains an _id, which was automatically added on insert.

find() also supports querying on specific elements that the resulting document must match. To filter our results to a document with author “Mike” we do:

>>> pprint.pprint(posts.find({"author": "Mike"}))
{'_id': '09727c8d-c188-4b7e-993d-e4107034315c',
 'author': 'Mike',
 'date': 1707761881.616323,
 'tags': ['petdb', 'python'],
 'text': 'My first blog post!'}

If we try with a different author, like “Eliot”, we’ll get no result:

>>> posts.find({"author": "Eliot"})
>>> 

Querying By Id#

We can also find a post by its _id, which in our example is “c79faa58-5a75-468a-b915-5963163b71d4”:

>>> pprint.pprint(posts.find({"_id": "09727c8d-c188-4b7e-993d-e4107034315c"}))
{'_id': '09727c8d-c188-4b7e-993d-e4107034315c',
 'author': 'Mike',
 'date': 1707761881.616323,
 'tags': ['petdb', 'python'],
 'text': 'My first blog post!'}

PetDB has the special method for searching by id. You can use get() for the better performance:

>>> pprint.pprint(posts.get("09727c8d-c188-4b7e-993d-e4107034315c"))
{'_id': '09727c8d-c188-4b7e-993d-e4107034315c',
 'author': 'Mike',
 'date': 1707761881.616323,
 'tags': ['petdb', 'python'],
 'text': 'My first blog post!'}

Multiple inserts#

In order to make querying a little more interesting, let’s insert a few more documents. In addition to inserting a single document, we can also perform multiple insert operations, by passing a list as the first argument to insert_many(). This will insert each document in the list:

>>> new_posts = [
...     {
...         "author": "Mike",
...         "text": "Another post!",
...         "tags": ["multiple", "insert"],
...         "date": datetime.datetime(2020, 11, 12, 11, 14).timestamp(),
...     },
...     {
...         "author": "Eliot",
...         "title": "PetDB is fun",
...         "text": "and pretty easy too!",
...         "date": datetime.datetime(2022, 5, 10, 14, 45).timestamp(),
...     },
... ]
>>> result = posts.insert_many(new_posts)
>>> result
[{'author': 'Mike', 'text': 'Another post!', 'tags': ['multiple', 'insert'], 'date': 1605172440.0, '_id': '6d60c18a-e647-4431-b6f2-7c60cfbba4b2'},
{'author': 'Eliot', 'title': 'PetDB is fun', 'text': 'and pretty easy too!', 'date': 1652183100.0, '_id': 'f294d7c0-795b-4fbe-9436-b8800ec5e845'}]

There are a couple of interesting things to note about this example:

  • The result from insert_many() now returns two documents, one for each inserted document.

  • new_posts[1] has a different “shape” than the other posts - there is no "tags" field, and we’ve added a new field, "title". This is what we mean when we say that PetDB is schema-free.

Querying for More Than One Document#

To get more than a single document as the result of a query we use the findall() method. findall() returns a list of documents. For example, we can iterate over every document in the posts collection:

>>> for post in posts.findall({}):
...     pprint.pprint(post)
...
{'_id': '09727c8d-c188-4b7e-993d-e4107034315c',
 'author': 'Mike',
 'date': 1707761881.616323,
 'tags': ['petdb', 'python'],
 'text': 'My first blog post!'}
{'_id': '6d60c18a-e647-4431-b6f2-7c60cfbba4b2',
 'author': 'Mike',
 'date': 1605172440.0,
 'tags': ['multiple', 'insert'],
 'text': 'Another post!'}
{'_id': 'f294d7c0-795b-4fbe-9436-b8800ec5e845',
 'author': 'Eliot',
 'date': 1652183100.0,
 'text': 'and pretty easy too!',
 'title': 'PetDB is fun'}

Just like we did with find(), we can pass a query to findall() to filter the returned results. Here, we get only those documents whose author is “Mike”:

>>> for post in posts.findall({"author": "Mike"}):
...     pprint.pprint(post)
...
{'_id': '09727c8d-c188-4b7e-993d-e4107034315c',
 'author': 'Mike',
 'date': 1707761881.616323,
 'tags': ['petdb', 'python'],
 'text': 'My first blog post!'}
{'_id': '6d60c18a-e647-4431-b6f2-7c60cfbba4b2',
 'author': 'Mike',
 'date': 1605172440.0,
 'tags': ['multiple', 'insert'],
 'text': 'Another post!'}

Counting#

If we just want to know how many documents match a query we can perform a size() operation instead of a full query. We can get a count of the all documents in the collection:

>>> posts.size()
3

or just of those documents that match a specific query:

>>> posts.size({"author": "Mike"})
2

Range Queries#

PetDB supports many different types of advanced queries. As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author:

>>> d = datetime.datetime(2023, 11, 12, 12).timestamp()
>>> for post in posts.filter({"date": {"$lt": d}}).sort("author"):
...     pprint.pprint(post)
... 
{'_id': 'f294d7c0-795b-4fbe-9436-b8800ec5e845',
 'author': 'Eliot',
 'date': 1652183100.0,
 'text': 'and pretty easy too!',
 'title': 'PetDB is fun'}
{'_id': '6d60c18a-e647-4431-b6f2-7c60cfbba4b2',
 'author': 'Mike',
 'date': 1605172440.0,
 'tags': ['multiple', 'insert'],
 'text': 'Another post!'}

Here we use the special $lt operator to do a range query, and also call sort() to sort the results by author.

Mutations chain#

You may have noticed that in the last example we are using a filter() method, not a findall(). It’s because in this case we need a chain of mutations, but a findall() method returns a list of documents, and a filter() returns a PetMutable object.

PetCollection

Represents an original collection, can only contain dict documents, is immutable, and only supports basic CRUD operations. Mutation methods, that don’t mutate containing documents, return PetMutable, e.g. filter(), see details below. Mutation methods, that mutate containing documents, return PetArray, e.g. pick().

PetMutable

Represents a mutated original collection, can also only contain dict documents, has a back link to the original collection: insert, update and delete methods affect the original collection, update and delete methods process only documents that containing in the mutated collection after all mutations. Any method, that mutate containing documents, returns a PetArray instance that hasn’t any back link to the original collection. Other methods return the new PetMutable instances.

PetArray

Represents a simple independent array without any back links, can contain anything and doesn’t affect the database in any way. All mutation methods return the new instances of PetArray.