“This sort of thing is my bag, baby!”
—Austin Powers
Combi provides 4 bag classes for you to choose from according to your needs:
The Bag class is an implementation of the mathematical concept of a multiset; meaning something like a set, except that every item could appear multiple times, and crucially, we only save the count of the item in memory instead of saving multiple copies of the same item, which would be a waste of memory.
You may know the collections.Counter class from Python’s standard library; the bag classes provided by Combi are very similar, except that they are more strictly defined as multisets, meaning that counts must be positive integers. (Zeros are allowed in input but they are weeded out.) By contrast, collections.Counter allows any value as an item’s count.
This restriction makes the bag classes more powerful than collections.Counter because it allows more methods to be defined. More arithmetical operations are defined, comparison between bags is supported, and more.
When you create a bag, it will be populated with the iterable argument. If iterable is a plain sequence, its items will be added one-by-one:
>>> Bag('abracadabra')
Bag({'c': 1, 'b': 2, 'd': 1, 'a': 5, 'r': 2})
If iterable is a mapping, its values will be taken as item counts:
>>> Bag({'meow': 2, 'woof': 5,})
Bag({'meow': 2, 'woof': 5})
Bag can be accessed similarly to a dict or Counter:
>>> my_bag = Bag('abracadabra')
>>> my_bag['b']
2
>>> 'x' in my_bag
False
>>> my_bag['x'] = 7
>>> my_bag
Bag({'r': 2, 'x': 7, 'b': 2, 'a': 5, 'd': 1, 'c': 1})
>>> 'x' in my_bag
True
>>> my_bag['y'] # If it's not in the bag, the count is zero:
0
>>> for key, count in my_bag.items():
... print((key, count))
...
('r', 2)
('x', 7)
('b', 2)
('a', 5)
('d', 1)
('c', 1)
An iterator over the elements in the bag. If an item has a count of 7 in the bag, it will repeat 7 times in bag.elements. Example:
>>> bag = Bag({'a': 1, 'b': 2, 'c': 3,})
>>> for element in bag.elements:
... print(element)
a
c
c
c
b
b
If you’re using OrderedBag or FrozenOrderedBag, the items will be yielded by their canonical order (usually insertion order), otherwise they’ll be yielded in arbitrary order, just like dicts and sets.
The number of elements in the bag, i.e. the sum of all the counts. Example:
>>> bag = Bag({'a': 1, 'b': 2, 'c': 3,})
>>> bag.n_elements
6
Clear the bag, making it empty.
Create a shallow copy of the bag.
Get a key’s count with a default fallback, just like dict.get().
Get an iterator over the keys in a bag:
>>> bag = Bag({'a': 1, 'b': 2, 'c': 3,})
>>> tuple(bag.keys())
('b', 'c', 'a')
Get an iterator over the counts in a bag:
>>> bag = Bag({'a': 1, 'b': 2, 'c': 3,})
>>> tuple(bag.values())
(2, 3, 1)
Get an iterator over the items in a bag, i.e. keys with counts:
>>> bag = Bag({'a': 1, 'b': 2, 'c': 3,})
>>> tuple(bag.items())
(('b', 2), ('c', 3), ('a', 1))
Get a tuple of the n most common elements and their counts from the most common to the least. If n is not specified, most_common() returns all elements in the bag. Elements with equal counts are ordered arbitrarily:
>>> Bag('abracadabra').most_common(3)
(('a', 5), ('r', 2), ('b', 2))
Get the count of key and remove it from the bag, optionally with a default fallback.
Remove a key from the bag, and get a tuple (key, count).
Get the count of key, optionally with a default fallback. If key is missing, its count in the bag will be set to the default.
Update the bag with a mapping of key to count.
Get all bags that are subsets of this bag.
This means all bags that have counts identical or smaller for each key.
Example:
>>> bag = Bag({'a': 1, 'b': 2, 'c': 3,})
>>> contained_bags = bag.get_contained_bags()
>>> len(contained_bags)
24
>>> contained_bags[7]
Bag({'c': 2, 'b': 1})
>>> contained_bags[7] < bag # Contained bag is contained.
True
FrozenBag is a multiset just like Bag, except it’s immutable. After it’s created, it can’t be modified. It’s a subclass over Bag.
In which cases would you want to use FrozenBag rather than Bag?
Get a mutable version of this frozen bag. Example:
>>> frozen_bag = FrozenBag('abracadabra')
>>> frozen_bag
FrozenBag({'d': 1, 'c': 1, 'b': 2, 'a': 5, 'r': 2})
>>> frozen_bag.get_mutable()
Bag({'d': 1, 'c': 1, 'b': 2, 'a': 5, 'r': 2})
OrderedBag is a multiset just like Bag, except it’s also ordered. Items have a defined order, which is by default the order in which they were inserted. OrderedBag is a subclass over Bag.
Another way to think of it is that OrderedBag is to Bag what collections.OrderedDict is to dict.
Example:
>>> ordered_bag = OrderedBag('abbcccdddd')
>>> tuple(ordered_bag.elements) # Items ordered by insertion order:
('a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd')
>>> tuple(ordered_bag.items())
(('a', 1), ('b', 2), ('c', 3), ('d', 4))
>>> ordered_bag.index('b')
1
Get a version of this ordered bag with order reversed. Example:
>>> ordered_bag = OrderedBag('abbcccdddd')
>>> ordered_bag
OrderedBag(OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
>>> ordered_bag.reversed
OrderedBag(OrderedDict([('d', 4), ('c', 3), ('b', 2), ('a', 1)]))
This does not change the existing bag, but creates a new one.
Get the index number (i.e. position) of key in the ordered bag.
Move a key to the end of the order. Specify last=False to move it to the start instead.
Sort the keys, changing the order in-place.
The optional key argument, (not to be confused with the bag keys,) is a key function. If it’s not passed in, default Python ordering will be used.
If reverse=True is used, the keys will be sorted in reverse order.
FrozenOrderedBag is a multiset just like Bag, except it’s immutable and ordered. You can think of it as a combination of FrozenBag and OrderedBag.
One of the advantages of Bag over collections.Counter is that Bag provides comparison methods between bags, that act similarly to comparisons between Python’s built-in sets. This makes it easy to see whether one bag is contained by another. Example:
>>> sandwich = Bag({'bread': 2, 'cheese': 1, 'tomato': 2, 'burger': 1,})
>>> vegan_sandwich = Bag({'bread': 2, 'tomato': 2,})
>>> vegan_sandwich < sandwich
True
>>> salad = Bag({'tomato': 2, 'bell pepper': 1,})
>>> salad < sandwich # False because there's no bell pepper in our sandwich
False
A bag is smaller-or-equal to another bag if it’s “contained” in it, meaning that every key in the contained bag also exists in the containing bag, and with a count that’s bigger or equal to its count in the contained bag.
A bag is strictly smaller than another bag if the above holds, and there’s at least one key for which the count in the containing bag is strictly bigger than its count in the contained bag.
Please note that unlike most comparisons in Python, this is a partial order rather than a total one, meaning that for some pairs of bags, neither x >= y nor y >= x holds true. This is similar to set comparison in Python.
Another advantage of Bag over collections.Counter is that Bag provides a wealth of arithmetic operations (addition, subtraction, etc.) between bags, and between bags and integers.
The basic arithmetic operations do what you expect them to, operating on the counts of items:
>>> sandwich = Bag({'bread': 2, 'cheese': 1, 'tomato': 2, 'burger': 1,})
>>> salad = Bag({'tomato': 2, 'bell pepper': 1,})
>>> single_tomato = Bag({'tomato': 1,})
>>> sandwich + single_tomato # Addition
Bag({'cheese': 1, 'bread': 2, 'tomato': 3, 'burger': 1})
>>> sandwich - single_tomato # Subtraction
Bag({'cheese': 1, 'bread': 2, 'tomato': 1, 'burger': 1})
>>> sandwich * 3 # Multiplication
Bag({'cheese': 3, 'bread': 6, 'tomato': 6, 'burger': 3})
As for division: You can divide one bag by another to get an integer saying how many times the second bag would go into the first. You can also divide a bag by an integer, which will divide all the counts by that integer, rounding down. All bag division is done with the floor-division operator //, because it’s always rounded-down. Examples:
>>> sandwich // single_tomato # Floor-division by another bag
2
>>> sandwich // 2 # Floor-division by integer
Bag({'bread': 1, 'tomato': 1})
The more advanced operations are also supported:
>>> sandwich % salad # Modulo by bag
Bag({'bread': 2, 'cheese': 1, 'burger': 1, 'tomato': 2})
>>> divmod(sandwich, salad) # Divmod by bag
(0, Bag({'tomato': 2, 'cheese': 1, 'bread': 2, 'burger': 1}))
>>> salad % 2 # Modulo by integer
Bag({'bell pepper': 1})
>>> divmod(salad, 2) # Divmod by integer
(Bag({'tomato': 1}), Bag({'bell pepper': 1}))
>>> salad ** 3 # Exponentiation
Bag({'tomato': 8, 'bell pepper': 1})
>>> pow(salad, 3, 5) # Exponentiation with modulo
Bag({'tomato': 3, 'bell pepper': 1})
And... Did someone say logical operations? Like in Python sets, these do union and intersection:
>>> sandwich | salad # Logical or, a.k.a. set union
Bag({'bread': 2, 'cheese': 1, 'burger': 1, 'tomato': 2, 'bell pepper': 1})
>>> sandwich & salad # Logical and, a.k.a. set intersection
Bag({'tomato': 2})
As a final note about arithmetic operations, augmented assignment is supported for all operations, so you can elegantly mutate bags in-place like this:
>>> sandwich += Bag({'bacon': 2, 'egg': 1,})
>>> sandwich
Bag({'cheese': 1, 'bacon': 2, 'bread': 2, 'egg': 1, 'burger': 1, 'tomato': 2})
>>> sandwich **= 2
>>> sandwich
Bag({'cheese': 1, 'bacon': 4, 'bread': 4, 'egg': 1, 'burger': 1, 'tomato': 4})