diff: memory exhausted

Yesterday, I tried to make a diff of two ~800 MB big files on a machine with 1 GB RAM - unfortunately, it didn’t work:

# diff -u full/2007-09-25-full.sql incr/incr.sql
diff: memory exhausted

After some googling and looking through various blogs and articles, it appeared to me that GNU diff just works like that - wants to load both files in memory. So, perhaps adding some more space would help?

# dd if=/dev/zero of=/swapfile bs=1024 count=5242800
# mkswap /swapfile
# swapon /swapfile

Indeed, it helped - I was able to make a diff of these two ~800 MB files. However, as soon as I tried to make a diff of ~1 GB files, diff again exited with “memory exhausted” message.

I tried a number of tools, but none of them was giving the output I wanted (similar to that of diff -u).

As I needed that diff for backup only, in the end, I use xdelta tool:

# xdelta delta -9 full/2007-09-25-full.sql incr/incr.sql 2007-09-25-sql.delta

It gives a binary delta file, but is able to produce it using much less memory than diff.

Ooh, and if your files are so big that xdelta is failing for you…

# xdelta delta -9 full/2008-06-01-full.sql incr/incr.sql 2008-06-02-incr.delta.temp
xdelta: mmap failed: Cannot allocate memory

…you may consider using xdelta3. The deltas it produces are a big bigger, but it works much better (read: does not fail) with large files. Syntax:

# xdelta3 -9 -f -e -s full/2008-06-01-full.sql incr/incr.sql 2008-06-02-incr.delta.temp

Still, it would be great to have a tool which could produce diff -u -style output for large files.

8 Comments

  1. SebDE:

    –speed-large-files?

  2. admin:

    No, –speed-large-files doesn’t help for big files (still “memory exhausted”).

  3. Jonas:

    Try bdiff: http://www.computerhope.com/unix/ubdiff.htm

  4. Ian Cottam:

    Do you still need help with this?
    If not, what did you decide on?

    I too was surprised that –speed-large-files runs out of memory.
    In the old days, it just used to run a program called diffh.
    That seems to have gone, so I have written my own, which you are welcome to try.
    -Ian

  5. admin:

    Yeah, it would be better to have a human-readable diff, and not a binary delta.

    Do you have a link?

  6. Ian Cottam:

    I could just email it to you.
    -Ian

  7. Ian Cottam:

    …but I can’t see your email address anywhere.
    If you can see mine, drop me a note and I’ll reply with the source code (C).
    -Ian

  8. ja:

    You can also ’split’ to break the two big files into smaller pieces, then diff each of the smaller pieces…

Leave a comment