1

I'm working on an university project with ML, and the project got quite big, I usually don't use github but I need to format my pc and do not trust the Google Drive backup I have, therefore I wanna have a second one so I don't lose the code whatsoever.

I'm using Git with GitHub desktop, I'm not very knowledgeable in Git, so I'm having a hard time uploading this project, since it disconnects everytime I try to upload it, I'm pretty sure it is because of the size, any help with that?

The IDE I'm using is PyCharm and the Python version is 3.7, I already have a requirements.txt created.

I tried searching for pre made git ignore files, but it didn't work.

Krlus
  • 13
  • 3
  • Why do you think a gitignore file will help you with your disconnection issues? – Sören Dec 01 '22 at 19:16
  • I searched about the error and it seems like github won't accept a 15.5gb project, since it is huge, therefore I imagine with a gitignore file I can upload the project without the dependencies, only the code per say. I could be wrong though. Edit: I'm trying to say that the disconnection is caused by the huge project I'm trying to upload, since github won't accept, it closes the connection. – Krlus Dec 01 '22 at 19:19
  • A gitignore file will not help you there - you need to remove the dependencies from your project's history. See e.g. https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#removing-files-from-a-repositorys-history – Sören Dec 01 '22 at 19:29
  • Will see if it helps, thank you very much, could you paste this in an answer, so if it works I can give you proper credit? – Krlus Dec 01 '22 at 19:30

2 Answers2

0

A .gitignore file will not help you there - you need to remove the dependencies from your project's history. There are two ways to do that:

The traditional way involves git-filter-branch. I've done that once in the past. It works, but it's easy to get wrong.

The alternative is to use BFG. I have no personal experience, but it seems to be easier to use, and claims to be faster. So if I were you, I'd give BFG a try.

Whichever way you try, make a lokal backup!

When you're done rewriting history, you can use a .gitignore to prevent yourself from re-adding the unwanted files.

Sören
  • 1,803
  • 2
  • 16
  • 23
0

Welcome to Stackoverflow!

As you already sensed by yourself, Git is not really made to work with volumes of data that are as large as you say (15.5GB). The most important thing you have to do right now is identify which files you want to keep track of, and which files are just "binary files" that don't have to be versioned. You don't have to use any other tool than your brain for this (but looking around with any type of file explorer will teach you a lot).

Deciding what to keep

It is important to be quite severe here. As a general approach (there can be exceptions), try to keep out the following files:

  • Any file that is >1MB. There will surely be exceptions, but in general this is a good rule of thumb.
  • Anything that is binary/non text based. Git is made to work with diffs on files and this is not user-friendly with non-text based files. Examples: images, videos, powerpoints, ...
  • Anything that is generated by code (for example results of compilations, or data processing, ...)
  • Anything that is generated by a tool you use (for example folders created by your IDE)
  • Any data file. Git is not really made for version control of data. It's really your code you want to version control.

Creating a git repository

It seems like you have made a git repository already, but unless you have very important history you want to keep I suggest starting anew from where you are now. If it's for a university project I can imagine it being fine that you lose your history until now. If it's not fine for you to lose your history, you will have to change your history and delete large files from your repo (a risky operation I would not recommend to a new Git user. More info can be found in this SO post).

I'm suggesting to start a fresh repository because I feel you will learn more in this way, but if you prefer to change your history go ahead!

To start off a fresh repository, go to the root directory of your project and copy the .git folder to some place as a backup. This is often a hidden folder, and it contains all of your history!

Then, delete this .git folder (making sure that you have kept your backup .git folder somewhere).

After than, execute the git init command. You have a fresh git repository to work with! Typing git status will show a bunch of untracked files.

Populating your gitignore

The first thing we will do now is make our .gitignore file, before committing anything else. Let's say that you decided in your first step to ignore the following:

  • all *.xlsx files
  • everything inside of the build/ directory
  • all *.log files

In that case, you should create a text file (with any text editor: your IDE or notepad or anything) called .gitignore. Open this up with your text editor of choice and add the following text in there:

*.xlsx
build/*
*.log

Now save the file. You have made your .gitignore file! Now add and commit the file (using a good commit message) and type git status. You should see none of the unwanted files appearing! Now you can commit all the rest of your files (properly check git status to see that no unwanted files are tracked by git before committing them!) and you have a clean lightweight repo.

Maintaining your gitignore

It's normal for the gitignore file to evolve during the project. Don't hesitate to add new lines in there if a new file type/folder enters the project that is actually unwanted in the repository.

Hope this helps you a bit!

Koedlt
  • 4,286
  • 8
  • 15
  • 33