diff: memory exhausted
Yesterday, I tried to make a diff of two ~800 MB big files on a machine with 1 GB RAM - unfortunately, it didn’t work:
# diff -u full/2007-09-25-full.sql incr/incr.sql
diff: memory exhausted
After some googling and looking through various blogs and articles, it appeared to me that GNU diff just works like that - wants to load both files in memory. So, perhaps adding some more space would help?
# dd if=/dev/zero of=/swapfile bs=1024 count=5242800
# mkswap /swapfile
# swapon /swapfile
Indeed, it helped - I was able to make a diff of these two ~800 MB files. However, as soon as I tried to make a diff of ~1 GB files, diff again exited with “memory exhausted” message.
I tried a number of tools, but none of them was giving the output I wanted (similar to that of diff -u).
As I needed that diff for backup only, in the end, I use xdelta tool:
# xdelta delta -9 full/2007-09-25-full.sql incr/incr.sql 2007-09-25-sql.delta
It gives a binary delta file, but is able to produce it using much less memory than diff.
Ooh, and if your files are so big that xdelta is failing for you…
# xdelta delta -9 full/2008-06-01-full.sql incr/incr.sql 2008-06-02-incr.delta.temp
xdelta: mmap failed: Cannot allocate memory
…you may consider using xdelta3. The deltas it produces are a big bigger, but it works much better (read: does not fail) with large files. Syntax:
# xdelta3 -9 -f -e -s full/2008-06-01-full.sql incr/incr.sql 2008-06-02-incr.delta.temp
Still, it would be great to have a tool which could produce diff -u -style output for large files.
SebDE:
–speed-large-files?
28 October 2007, 10:11 pmadmin:
No, –speed-large-files doesn’t help for big files (still “memory exhausted”).
30 October 2007, 9:23 pmJonas:
Try bdiff: http://www.computerhope.com/unix/ubdiff.htm
23 February 2008, 11:31 pmIan Cottam:
Do you still need help with this?
If not, what did you decide on?
I too was surprised that –speed-large-files runs out of memory.
15 May 2008, 3:30 pmIn the old days, it just used to run a program called diffh.
That seems to have gone, so I have written my own, which you are welcome to try.
-Ian
admin:
Yeah, it would be better to have a human-readable diff, and not a binary delta.
Do you have a link?
15 May 2008, 3:36 pmIan Cottam:
I could just email it to you.
15 May 2008, 6:02 pm-Ian
Ian Cottam:
…but I can’t see your email address anywhere.
15 May 2008, 7:09 pmIf you can see mine, drop me a note and I’ll reply with the source code (C).
-Ian
ja:
You can also ’split’ to break the two big files into smaller pieces, then diff each of the smaller pieces…
22 July 2008, 8:27 pm