1

I have a "big" file containing two python classes, and I want to split this file in two, one file for each class.

One solution would consist of copying the content of the classes in two new files and then deleting the first one, but this would induce a huge delta in history, and won't keep track of the previous states of the original file (as it has been deleted)enter image description here

I want to split the file in two such that the deltas between the two states have only two lines, see the linked picture, I hope I made it clear enough...

Is it possible ?

NB : one intermediary way would consist of cutting and pasting only the second part of the file, and then using git mv so we would keep track of half of the file, but still we would have a "huge" delta in the history, which I'm trying not to have.

Thomas
  • 263
  • 3
  • 14
  • 1
    Git doesn't contain "deltas". What actual problem are you having or attempting to solve? – matt Jun 20 '22 at 16:08
  • What you're asking for is in fact not a difference in how git stores stuff but in how the diff is calculated, as that's always just that: given two commits (or technically two trees) a diff will try to find a representation of those changes that *meaningful to humans*. But that diff does not influence how git actually stores stuff (which is as complete snapshots, sometimes compressed using delta-encoding). – Joachim Sauer Jun 20 '22 at 16:11
  • You could copy the file in one commit and then delete one class each in both files in another commit, is that what you want? I'm assuming your goal is to make it apparent in the history that neither class was changed in the process? – mkrieger1 Jun 20 '22 at 16:12
  • maybe I have not well understood the functioning of git then =/ From what I got : when we make a commit, git only stores the difference between the file at its current state and the file a previous state (that's what I (possibly wrongly) called _delta_ – Thomas Jun 20 '22 at 16:12
  • @mkrieger yes I guess : we can see that this file is taken from the "big" one. So in the future, if a go back in history from a file I will eventually get back to the initial file (`file0.py` in my example) – Thomas Jun 20 '22 at 16:17
  • 3
    Does this answer your question? [Keep git history when splitting a file](https://stackoverflow.com/questions/3887736/keep-git-history-when-splitting-a-file) – TTT Jun 20 '22 at 16:17
  • 1
    Conceptually (and initially physically) git stores a full snapshot of each commits (while sharing files/directories that stay unchanged to preserve storage requirements). As an additional step those snapshots can *sometimes* be stored using delta-compression, but that's an internal implementation detail that's effectively invisible to the user. For most intents and purposes you should treat git as if it was simply storing a full snapshot of each commit. Any "history views", "commits logs", "blame views" or even "diffs" are *calculated on the fly when requested* from those stored snapshots. – Joachim Sauer Jun 20 '22 at 16:18

2 Answers2

3

The thing to understand here is that Git does not store deltas or history. There is no "delta" that can be big or small, and there is no "history" to "keep". Git calculates an account / presentation of the history when you do, say, a git log, and it calculates the deltas when you do, say, a git diff (or a git log that shows diffs as patches).

You cannot really manipulate or second-guess how this works. When you do a git log, you can tweak how closely two files need to be similar in order to be considered "the same file" if one has vanished and the other has appeared (because you renamed the file). But if you are hoping that somehow both files in the split will magically "lead back" to the one original file in the previous commit, give up; that's not how Git thinks.

And you should not worry about the "size" of "deltas" because there are no deltas. Every commit is a snapshot of all your files at that moment. There's no point trying to second-guess that. Just let Git do its thing.

matt
  • 515,959
  • 87
  • 875
  • 1,141
1

I want to split the file in two such that the deltas between the two states have only two lines, see the linked picture,

Deltas for presentation depend on the audience and purpose and are basically guaranteed suboptimal for storage compression. Git's internal deltas are done for storage compression. They're not done against just some previous version, they're done against as much history as Git's been configured to inspect; at the factory defaults, for most projects, that's "all of it". Nobody but the devs ever wants or needs to see those.

If you see a presentation delta you don't like, make it go away.

For instance, if you know your code was extracted from a larger source, turn Git's copy/rename sensitivity way up and have it just trace the current hunk,

git log -p -C30 -L1,`wc -l<myfile.py`:myfile.py`

Simplest way to see what produced your current source in the history you're describing is git blame -C30, that will show you where all your current source was added, and with a decent programmer's editor you can step back through the versions with like two keystrokes.

I don't think anyone's yet implemented a summarizer that will reduce a whole-file add to just say "all of it", but when I do

git log -p -C30 -L1,`wc -l <split1`:split1

on a test blob, ~500 paragraphs of lorem ipsum in testing split into 200 paragraphs in split1 and the rest in split2, it shows me just that hunk added in testing.

jthill
  • 55,082
  • 5
  • 77
  • 137