I have various Python applications that run periodically and deal with data in lists and dictionaries, and they save the processed and/or discovered data as JSON in text files for logging purposes or other later use. Typically the files are quite large and take up substantial amount of disk space.
Since the applications run on Linux servers the “standard” way of dealing with periodically created output files is to use logrotate
to compress the files and purge old files later if desired.
Then I discovered the gzip
module in Python standard library. For my purposes using gzip
is just a matter of replacing open()
with gzip.open()
. For example, gziptest.py
:
import gzip import json data = {x: x*2 for x in range(1000000)} with gzip.open("output.json.gz", "wt") as fp: json.dump(data, fp, indent=4) fp.write("\n")
The results:
$ python3 gziptest.py $ ls -l output.json.gz -rw-r--r-- 1 markku markku 4734425 Mar 27 13:20 output.json.gz $ file output.json.gz output.json.gz: gzip compressed data, was "output.json", last modified: Sun Mar 27 10:20:13 2022, max compression, original size 22333338 $ zcat output.json.gz | tail -5 "999996": 1999992, "999997": 1999994, "999998": 1999996, "999999": 1999998 } $
That takes care of having the application output files compressed right from the creation. logrotate
can still be used for purging the old files if needed.