7

At work we have a large Perforce repository (approx 40k changelists, total storage size ~145GB). We're generally happy with Perforce with only some mild gripes, but we're planning to go to a more distributed development model and as a result, would like to move to a more distributed version control system as well.

So far, I've looked at the usual suspects (git, mercurial and potentially bazaar as I have good experience with it) but our main hurdle currently is to get the version history out of Perforce and imported into the various DVCSs so we don't lose the history. We'd also prefer not to have the Perforce server hang around if we don't absolutely have to keep it - my experience with this sort of migration is that nobody looks at the old repo after a while so you'd be losing the history that way.

As there are multiple projects in the repository the idea is to split it into multiple DVCS projects when we're exporting the history as not everybody needs to be able to see every part of the history. However our biggest project still contains about 2/3rds of the committed revisions and also takes up approx 2/3rds of the storage. It also has the largest number of branches - probably around 30.

So far, I've tried the following - everything is on Windows as we're a Windows-only shop:

  • Import into Mercurial using the hg convert extension. This appears to work very well for the main branch of the project I'm converting, but attempting to convert the Perforce branches into named Mercurial branches using a branchmap still appears to produce a flat import with every checkin on the default branch. Maybe that's because I set the branch map up wrong, but hg help convert suggests that you can only turn a Perforce repo into a "flat" structure with no branches using this importer, which isn't really good enough for our use.
  • Import into Git using git-p4.py. Perforce documents using git as a distributed front end to Perforce and basing the close on the latest revision(s) of the repo does produce a usable git repo. Attempting to import the whole sub-project with branches breaks the importer as it runs out of memory, so I can't even tell if it manages to import our repo correctly.
  • I then had this brilliant brain fart of importing the Perforce repo into SVN with all the branches mapped to appropriate SVN branches as every version control system under the sun can import from SVN. This would be only using SVN as an intermediate step in the conversion, not as the target VCS - we wouldn't really gain anything from this conversion otherwise. Using p42svn.pl, that broke fairly early on in the process as our Perforce server didn't seem to like being hammered by the script that seems to make a new connection for every file/revision.
  • I haven't looked into exporting the history into Bazaar yet as it's a bit of an also-ran.

So, my questions are:

  • Is there a good tool besides p42svn.pl to export a Perforce repo into SVN? I don't mind using SVN as an intermediate repo as it seems to make exporting into all the DVCSs we're looking at reasonably easy.
  • Has anybody successfully exported branches from Perforce into Mercurial named branches and if so, how did you do it? The docs on the convert extension seem to be a bit sparse and I don't seem to be able to find a good/working way to do this.
Timo Geusch
  • 24,095
  • 5
  • 52
  • 70
  • Can you expand on why you thought of using SVN, as this isn't a DVCS and in some ways would be a step back from Perforce ? Or were you just going to use SVN as an intermediate step ? – gareth_bowles Feb 28 '12 at 17:02
  • 1
    @gareth_bowles, SVN would only be used as an intermediate step, not as the final target for the conversion. The main reason for considering SVN is that the SVN importers I've encountered so far for the various DVCSs seem to be more sophisticated than the ones that interact directly with Perforce. – Timo Geusch Feb 28 '12 at 17:05
  • 4
    I'm sure that you have already made the decision to switch from Perforce however would like to point out a recently new addition to Perforce called `P4SandBox` that greatly improves distributed development. Read more: http://blog.perforce.com/blog/?p=6000 – Dennis Feb 28 '12 at 21:22
  • Another reason to point out new features is that many companies do not upgrade their P4 server regularly (understandable as there is always a risk and you cannot roll-back a p4database upgrade). I find many users completely unaware of what they can achieve with Perforce. – Dennis Feb 28 '12 at 21:26
  • Actually we haven't made that decision yet, we're currently trying to determine what our options are and see how they work for us before we make that sort of decision. Moving VCSs is hugely disruptive to everybody so we want to make sure it's worth the effort. Also, the p4sandbox might just do what we need. – Timo Geusch Feb 28 '12 at 21:38
  • 1
    Note that 100GB git repository is probably too big repository (even considering that expected resulting repository should be smaller). If there are large binary blobs there they are better to be externalized somewhere. – Vi. Feb 28 '12 at 21:49
  • We are also working on making the git-p4 connector more robust. I'd love to get more details on what was stopping you. – James Creasy Feb 29 '12 at 18:39
  • @jamescreasy, what's the best way to contact you? I'm happy to share details of the problem but I don't think they're that suitable to include in this question. – Timo Geusch Feb 29 '12 at 18:49
  • @gareth_bowles A distributed version control system is not inherently _better_ than a centralized version control system. It's just different. In most companies, if you're building from a central repo, and you're only releasing from the centra repo, and you want control who can access that repo, you're not gaining anything with a distributed vcs. I know several shops that switched from P4to SVN and didn't consider it a step backwards. There are more tools that integrate with SVN that P4, and you don' need to track licenses. Plus, SVN is simpler since you don't need to create views. – David W. Feb 29 '12 at 20:55
  • @DavidW, I didn't intend to suggest that DVCS is better than centralized SCM. We use both Perforce and Git here at Netflix as they each do some things better than the other. – gareth_bowles Feb 29 '12 at 21:05
  • @gareth_bowles Sorry, just a bit sensitive. I do a lot of consulting for CM, and I'll come in with a proposal to use Subversion or Perforce, and some developer will stand up and tell me that those two VCS are bad because they're _centralized_ and Git is way better, and I'm stupid, blah, blah blah... I'm comfortable with both Git and SVN, but in some circumstances one is better than the other. – David W. Feb 29 '12 at 21:13
  • @Timo Geusch, a temp email is on my profile here. – James Creasy Feb 29 '12 at 22:38
  • @jamescreasy, thanks, email with the problem details will be forthcoming later today. – Timo Geusch Feb 29 '12 at 23:09

3 Answers3

5

As you know switching source control systems is a huge task and one not to be taken lightly. There is considerable risk and downtime as 1) you make the actual transition and 2) then again as everyone re-tools and gets up-to-speed with the new system.

As you as still investigating your options, I would seriously take a breath and look into P4 Sandbox to see if that will meet your requirements.

More information about P4 Sandbox is below.

Overview
- P4Sandbox Feature Demo (Video)

Blog Posts
- P4Sandbox Private local branching, distributed development, and more
- P4Sandbox’s First Submit
- Distributed Development and P4Sandbox
- Private Branching with P4Sandbox
- Task-focused Work in P4Sandbox

Forum Discussion
- New Features Discussion on the official forums

David
  • 9,635
  • 5
  • 62
  • 68
Dennis
  • 20,275
  • 4
  • 64
  • 80
4

My word, your repository is really almost 200 Gigabytes in size? I feel sorry for the first fool who does a git pull to get a copy of the repository, and discover they're now downloading 150 gigabytes worth of data.

My suggestion: Don't bother with the entire history. All you really need are the active versions and branches. Think of this as an opportunity to toss out deadwood, and to restructure your repository.

I use to be an advocate of always getting as much history as possible, but one day I had to convert a StarTeam repository to ClearCase, and it just couldn't be done. The command line tools in StarTeam were poor, and the API just couldn't do what I need.

We simply downloaded the versions that customers had, the branches we were working on, and a few versions of the source. We kept our old StarTeam server up and running just in case someone might need to look at the source, but no one did.

However, if you do want to go through this, it really shouldn't be that bad. You could probably write a Python or Perl script to do the conversion for you.

Perforce tracks history via numbered changesets. Yes, each file has its own version number, but you really aren't too interested in that, you are more interested in the change sets.

If your P4 last changeset is 1,000, you could loop though changesets 1 to 1,000. Perforce sometimes skips a changeset, but that's pretty easy to detect. Each changeset has a date, the name of the person who made that commit, and their comment. With this information, you push your changes to the Git repository, and change the date, author, and comment of that commit.

By the way, since you're moving to Git, I hope you'll break up your repository into separate repos. And, if you committed built objects, remove them from the Perforce repository before you move them into Git. You should never store a built object in the repository -- especially if they're binary. They take up a lot of room, and become obsolete very quickly.

Community
  • 1
  • 1
David W.
  • 105,218
  • 39
  • 216
  • 337
  • Well, it's not predetermined that we move the repo to anything at the moment - we're at the exploratory stage. *If* we move to a different VCS then transferring all the history is a must - I have done it in the past for client where we couldn't but in this case it's either all history or no move. There might be the option of using git-p4 as a front end, or using p4sandbox if we don't. – Timo Geusch Feb 29 '12 at 21:52
  • @TimoGeusch - You could do a loop checking out each P4 change set in order, copying those files over to a new repository working directory of the other VCS, and then committing them. At least with P4 change sets, you know the repository history from each point in time. The problem is what happens when a file is removed or renamed. I don't quite remember how this shows up in P4 history. P4 does not track directory changes. That's why you have the Integration records, but I'm not sure how'd you detect that. – David W. Mar 05 '12 at 04:53
1

We (I work at perforce) built a product do provide a git interface to the Perforce depot.

http://www.perforce.com/product/components/git-fusion

I used this internally for over a year, it's great, since you can try out the new DVCS approach (how many repos you want) with a "live" Perforce backend. I was the only team member using git while everyone else used p4 or p4v. Ergo, people could work using git and gradually decide upon your migration configuration.

There is support for mapping branches between the two systems: http://www.perforce.com/perforce/doc.current/manuals/git-fusion/index.html#chapter_dyn_ngj_3l.html#section_kkz_gqv_rl

I'm not sure if this solves all of the systems above, since I'm sure you can only go from git to X.

Tristan Juricek
  • 1,804
  • 18
  • 20