Working with Process Pipe and Its 64KiB Limit
Helge Heß pointed out that naive usage of Pipe
in child Process
es can break your program if you pipe too much data.
I wasn’t aware of this, followed his references, and here are my findings.
Pipe Buffer Size
Older Mac OS X versions had a pipe buffer size of 16KiB by default, offering 64KiB on demand; in my N=1 test on an M1 with macOS 14, I always get 64KiB buffers, even if I only send 1 Byte. Run pipe buffer size discovery tests yourself to check.
So the upper limit is 64KiB (65536 Bytes) for all intents and purposes.
Want to send even 1 Byte extra?
- If the data is larger than the pipe buffer, you need to drain the corresponding
FileHandle
with repeated read calls. (Or provide data larger than 64KiB with repeated write calls, respectively.) - If you try to send/receive the whole buffer in one go, from a user’s perspective, your program will freeze, and the read call never return. As a CLI app, it’ll never terminate.
Naive reading from the FileHandle
would be using the deprecated FileHandle.readDataToEndOfFile()
you will see in many examples online, and the somewhat newer readToEnd()
API.
Instead, you’re supposed to use readabilityHandler
with FileHandle.availableData
for reading, and writeabilityHandler
with FileHandle.write(_:)
to stream data from/to a pipe.
(That implies that Paul Hudson’s “How to run an external program using Process” needs to be changed to be safe.)
Sending Data via STDIN to Another Process
To be clear, this is not about reading your process’s standard input.
From the perspective of the child Process
you’re about to spawn, the data you send is its standard input pipe. The following helper will allow you to write something like:
childProcess.standardInput = try .stdin(string: ...)!
Here’s an example helper to create a Pipe
that you can use as standard input to another process to send an arbitrarily long string:
extension Pipe {
static func stdin(string: String) throws -> Pipe? {
guard let data = string.data(using: .utf8) else { return nil }
let stdin = Pipe()
stdin.fileHandleForWriting.writeabilityHandler = { handle in
handle.write(data)
try! handle.close() // Without closing, it'll never finish, but
// what to do with the error except crash
// is not clear to me :)
handle.writeabilityHandler = nil
}
return stdin
}
}
As you can see, I don’t need to compute 64KiB chunks to make this work, I just need to use the writabilityHandler
.
The similar-looking naive example of writing the data directly won’t work for strings larger than the 64KiB limit:
extension Pipe {
static func broken_stdin(string: String) throws -> Pipe {
guard let data = string.data(using: .utf8) else { return nil }
let stdin = Pipe()
try stdin.fileHandleForWriting.write(contentsOf: data)
try stdin.fileHandleForWriting.close()
return stdin
}
}
Reading Data from a Child Process’s STDOUT (or Your STDIN)
Helge shares a ProcessHelper
implementation that shows how to use a readabilityHandler
and collect the data.
A simplified example is:
let stdoutPipe = Pipe()
var outputData = Data()
let outputDataQueue = DispatchQueue(label: "outputDataQueue")
stdoutPipe.fileHandleForReading.readabilityHandler = { handle stdin
let data = handle.availableData
outputDataQueue.async { outputData.append(data) }
}
// run the child process, wait for it to finish, then use outputData
stdoutPipe.fileHandleForReading.readabilityHandler = nil
stdoutPipe.fileHandleForReading.closeFile() // Helge runs this on outputDataQueue, but I'm not certain it's necessary
Appending data from another queue is hinted at from the docs:
Assigning a valid Block object to this property creates a dispatch source for reading the contents of the file or socket. Your block is submitted to the file handle’s dispatch queue when there is data to read.
So the block may not run on your calling thread or the main queue at all. Assume it doesn’t, and put your operations to a queue under your control.
Run the Experiments
To see for yourself, I uploaded a .sh
receiver and .swift
sender as a Gist.
- The sender will pipe 64KiB of string data, in 1KiB chunks, to the receiver.
- The receiver echo’s whatever it gets.
- Once you exceed 65536 Bytes even by 1 Byte, with the naive approach, the receiver won’t echo anything and your program won’t terminate (at all).
Takeaway
Unless you know what you’ll be sending, and that it won’t exceed 64KiB, avoid e.g. write(contentsOf:)
and use the block-based write/read handlers. If you don’t want to use the block-based handler for some reason, make sure to add an assert
or precondition
to codify your expectation of maximum data size.
Crashing there is better than the process never finishing.
My use case was about reading a user-provided text file from disk, performing some transformations, and then pipe the result to another program. The mere presence of user-provided data from the file is reason enough to use the safer methods. You can’t make meaningful assumptions about the data size.
Closing With a Warning
- Sven Schmidt from the Swift Package Index mentioned that they ran into hard-to-debug problems with Swift’s standard
Process
implementation, so they migrated the SPI code to use Swift Tools Support Core (TSC)Process
. It sounds like they could be migratingswift-testing
’s take on process spawning in the future, though (esp. since TSC is being deprecated, it sounds like adopting TSC’s code nowadays will put the burden of maintenance on you in the long run). - Matt Massicotte also recalls running into weird bugs that were hard to reproduce when using a
readabilityHandler
andhandle.availableData
.