Compressed Data Between PowerShell and Python

I have built an application that transfers JSON data between remote PowerShell and Python components using Amazon SQS (Simple Queue Service). The size of the data is usually quite small, fitting nicely into the 256 KiB message size limit of SQS, letting me avoid complex multi-message handling or temporary S3 objects.

In specific cases, however, the data size exceeds the SQS message size limit. I still want to keep the protocol as simple as possible. Since the data is JSON, it should compress very well using the generally available zlib library.

Let’s see how the compression is done in PowerShell. The best I could find is using DeflateStream that should use zlib:

$data = '{"status": "ok", "message": "lots of information here yessssssssssssssssssssssssss"}'
$bytes = [System.Text.Encoding]::UTF8.GetBytes($data)
$memoryStream = New-Object System.IO.MemoryStream
$deflateStream = New-Object System.IO.Compression.DeflateStream($memoryStream, [System.IO.Compression.CompressionMode]::Compress)
$deflateStream.Write($bytes, 0, $bytes.Length)
$deflateStream.Close()
$compressedBytes = $memoryStream.ToArray()
$compressedData = [Convert]::ToBase64String($compressedBytes)
Write-Host $compressedData

The last line before the output converts the compressed binary data into a Base64 encoded string, suitable for using in the SQS message.

Example result for the given input in $data variable:

q1YqLkksKS1WslJQys9W0lFQyk0tLk5MTwUJ5OSXFCvkpylk5qXlF+UmlmTm5ylkpBalKlQC1eACSrUA

(Let me say right away that compressing a short string and adding the Base64 encoding overhead is not exactly the best example of using compression algorithms. But believe me, in the actual application the compression of over 256 KiB real JSON data works very well, resulting in about 90% reduction in the size of data.)

In the receiving end there is a Python application that needs to decompress the data. Let’s try it:

import base64
import zlib

data = "q1YqLkksKS1WslJQys9W0lFQyk0tLk5MTwUJ5OSXFCvkpylk5qXlF+UmlmTm5ylkpBalKlQC1eACSrUA"
decoded_data = base64.b64decode(data)
uncompressed_data = zlib.decompress(decoded_data).decode("utf-8")
print(uncompressed_data)

But, the result is:

zlib.error: Error -3 while decompressing data: incorrect header check

Clearly there is something wrong with interpreting the data.

In the zlib.decompress() method documentation it says: “The wbits parameter controls the size of the history buffer (or “window size”), and what header and trailer format is expected.” So, apparently the PowerShell-generated deflated stream is not having the header expected by zlib in Python.

The default for wbits is MAX_WBITS (15), which expects a valid zlib header and trailer in the data. Let’s try with -15 instead (“raw stream with no header or trailer”):

import base64
import zlib

data = "q1YqLkksKS1WslJQys9W0lFQyk0tLk5MTwUJ5OSXFCvkpylk5qXlF+UmlmTm5ylkpBalKlQC1eACSrUA"
decoded_data = base64.b64decode(data)
uncompressed_data = zlib.decompress(decoded_data, wbits=-15).decode("utf-8")
print(uncompressed_data)

Now the output is:

{"status": "ok", "message": "lots of information here yessssssssssssssssssssssssss"}

Great success!

This allows me to scale the application for all the practical use cases without introducing a new complexity level in the protocol.

Leave a Reply