Saving Gzip Files in Python

I have various Python applications that run periodically and deal with data in lists and dictionaries, and they save the processed and/or discovered data as JSON in text files for logging purposes or other later use. Typically the files are quite large and take up substantial amount of disk space.

Since the applications run on Linux servers the “standard” way of dealing with periodically created output files is to use logrotate to compress the files and purge old files later if desired.

Then I discovered the gzip module in Python standard library. For my purposes using gzip is just a matter of replacing open() with gzip.open(). For example, gziptest.py:

import gzip
import json

data = {x: x*2 for x in range(1000000)}
with gzip.open("output.json.gz", "wt") as fp:
    json.dump(data, fp, indent=4)
    fp.write("\n")

The results:

$ python3 gziptest.py
$ ls -l output.json.gz
-rw-r--r-- 1 markku markku 4734425 Mar 27 13:20 output.json.gz
$ file output.json.gz
output.json.gz: gzip compressed data, was "output.json", last modified: Sun Mar 27 10:20:13 2022, max compression, original size 22333338
$ zcat output.json.gz | tail -5
    "999996": 1999992,
    "999997": 1999994,
    "999998": 1999996,
    "999999": 1999998
}
$

That takes care of having the application output files compressed right from the creation. logrotate can still be used for purging the old files if needed.

Leave a Reply