All Snippets Snippet PHP

Generator File Streaming

Memory

Read large files one row at a time using PHP generators for constant memory usage.

Use this when:

You need to process a large CSV or data file without running out of memory.

D
Kunwar "AKA" AJ Sharing what I have learned
Jan 4, 2026 2 min Data Engineering

The Pattern

streamCsvFile.php
function streamCsvFile(string $filepath): Generator
{
    $handle = fopen($filepath, 'r');

    if ($handle === false) {
        throw new RuntimeException("Cannot open file: $filepath");
    }

    try {
        // Read header row
        $headers = fgetcsv($handle);

        // Yield each row as associative array
        while (($row = fgetcsv($handle)) !== false) {
            yield array_combine($headers, $row);
        }
    } finally {
        fclose($handle);
    }
}

What Happens Under the Hood

When you call this function, PHP does not read the entire file. Let us trace what actually happens:

execution-flow.php
$stream = streamCsvFile('million-rows.csv');
// At this point: File opened, headers read, ZERO data rows in memory

foreach ($stream as $row) {
    // First iteration:
    //   PHP reads ONE row from disk
    //   Creates ONE associative array
    //   Yields it to your loop
    //   Memory: ~1KB

    processRow($row);

    // After processing:
    //   $row goes out of scope
    //   Memory released
    //   Ready for next row
}

Why This Matters

Compare memory usage for a 1 million row CSV file:

memory-comparison.txt
Traditional approach (load all):
  $data = file('million-rows.csv');
  Memory: ~500MB (entire file in memory)

Generator approach:
  foreach (streamCsvFile('million-rows.csv') as $row)
  Memory: ~1KB (one row at a time)

The generator approach uses constant memory regardless of file size. A 10GB file uses the same memory as a 10KB file.

Usage Example

usage.php
// Process a large CSV file without running out of memory
$stream = streamCsvFile('/data/export.csv');

$processed = 0;
foreach ($stream as $row) {
    // Each $row is an associative array:
    // ['name' => 'John', 'email' => 'john@example.com', ...]

    if (validateRow($row)) {
        saveToDatabase($row);
        $processed++;
    }
}

echo "Processed $processed records";

The yield keyword is the key. It pauses the function, returns one value, and resumes exactly where it left off when you ask for the next value. This is lazy evaluation—work happens only when needed.