0

I'm building something off of a third-party application (in Typescript) that wasn't designed to be modular. I'm looking to extract a small set of functions from this project; in a subsequent pass I will apply patches so that the extracted functions work in the (reduced) extracted environment.

The issue I'm facing is how to do this extraction automatically and reliably. If it were one-off, I would just edit the files in question, but I want to keep tracking upstream. Standard diff/VCS tools don't fit the bill, because the difference would show up as a massive "removal," and then there would be conflicts whenever anything upstream changed.

Are there tools or a set of commands designed for this purpose?

D0SBoots
  • 705
  • 6
  • 18

1 Answers1

1

Are there tools or a set of commands designed for this purpose?

I am quite sure the answer here is no because it would require interpreting the meaning/content of the code.

  • What if one of the functions you are tracking is renamed?
  • What if one of the functions you are tracking is renamed and also changed a little in the same commit?
  • What if one of the functions you are tracking is renamed and also changed a lot in the same commit?
  • At what point does a large amount of changing the content of a functions change it into something different?
  • What if one of the functions you are tracking is merged with another function?
  • What if one of the functions you are tracking is split up into multiple functions?
  • Etc

It is the same domain as saying a line is changed which no (sane) tools will do, they only say lines added and lines deleted separately.


Having said that a version control system (e.g. git) is the only viable tool to assist in doing this. For some scenarios it might fail, but with a proper structure it will probably be able to do this mostly automatic (really dependent on the amount of changes in the upstream project).

What I would do is the following:

Step 1

I am assuming that you are working on a clone of the upstream repository and the branch you want to pick from is main. This clone repo could then be a submodule in your other repository (not covered further in this answer).

git clone https://git.example.com/some-repo.git
cd some-repo
git checkout main

Step 2

Then run git branch main.extract main to make your own main.extract branch where you remove the stuff you do not want, but not just all in one operation.

Assuming the upstream file contains the following (and you are only interested in function1 and function4):

import { whatever } from './whatever';
import { somethingelse } from './something/else';

export function1(a: string) {
...
}

export function2() {
...
}

export function3(a: SomeClass, b: number) {
...
}

export function4(a: string[]) {
...
}

The first step is to just add some "marker lines" to help separate the parts you want and part you do not want, and this should be done alone as a separate commit. While adding one line with // Begin save this function before and one line with // End save this function after will accomplish this you are much better off using more than one line in order to create a stronger barrier to changes near above/below those lines. I am showing with only three lines below, but in real life you should use 5-6 lines.

import { whatever } from './whatever';
import { somethingelse } from './something/else';

// Begin save function1
// Begin save function1
// Begin save function1
export function1(a: string) {
...
}
// End save function1
// End save function1
// End save function1

export function2() {
...
}

export function3(a: SomeClass, b: number) {
...
}

// Begin save function4
// Begin save function4
// Begin save function4
export function4(a: string[]) {
...
}
// End save function4
// End save function4
// End save function4

When incorporating upstream changes later on it is not unlikely that you will get conflicts, but resolving those for this commit will be trivial. Even in case of say one of the functions being moved to somewhere else, moving the separation comments correspondingly is easy.

As always, small commits that does one and one thing only is the key to having a pleasant version control experience and reducing conflicts.

git add upsteamfile.ts
git commit -m "Added separation comment lines"

Step 3

Now with this in place the next step is to (only) remove the upstream parts that you are not interested in:

import { whatever } from './whatever';

// Begin save function1
// Begin save function1
// Begin save function1
export function1(a: string) {
...
}
// End save function1
// End save function1
// End save function1

// Begin save function4
// Begin save function4
// Begin save function4
export function4(a: string[]) {
...
}
// End save function4
// End save function4
// End save function4

Check this in as a separate commit. Later on when updating to later upstream versions you might get conflicts (say a new function5 is added), but again this is trivial to resolve. Maybe you could use the -s ort -Xours (notice not -s ours!) merge strategy (for this commit only!) but I have no experience using such merge strategies, I have always used KDiff3 and would strongly recommend doing the same.

git add upsteamfile.ts
git commit -m "Removed unwanted parts"

Step 4

Now with the unwanted parts gone, you can remove the separation comments (trivial edit, I am not showing an example of this).

git add upsteamfile.ts
git commit -m "Removed separation comments"

Step 5

If you have any additional modifications you want to the functions then apply them here.

${EDITOR:-nano} upsteamfile.ts
git add upsteamfile.ts
git commit -m "My own customization of function1"
${EDITOR:-nano} upsteamfile.ts
git add upsteamfile.ts
git commit -m "My own customization of function4"

Or of course split up in multiple commits if you do multiple modifications.


Future updates

So far all the above has been initial setup. But the main part of the question is what about later changes. So let's consider those. Assuming you started with upstream release v1.0.0 and you want to update to release v2.0.0. With the steps from above in place there is only one or two things to do.

  1. Create a branch/tag to keep a reference to the old version (optional).
git tag main.extract-v1.0 main.extract
  1. Then fetch the new upstream and rebase.
git checkout main
git pull --prune
git rebase main main.extract

That's it. You might get conflicts on the rebase, but except for the (optional) last commit with your extra modifications to the original source, all the other commits should be literally trivial to resolve.

And conflicts on extra modifications to the original source is inherently unavoidable, so this is as good as it possible can be.

hlovdal
  • 26,565
  • 10
  • 94
  • 165
  • I agree that going through VCS is the best plan here. I was hoping for something that would let me mark the removals in Step 3 as "uninteresting," so I wouldn't get conflicts when upstream (inevitably) changes things in there. But this seems like the way; it may even be automateable, with some work. – D0SBoots Apr 16 '23 at 21:07