0

I have working with a file that exceeds Github's recommended maximum file (greater than 100 MB). I created a new file with less file size than original file and remove the old file. In that moment I run git add . and then I commited all changes. When I tried to update all changes to remote repository but I get the following error:

Counting objects: 20, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (20/20), done.
Writing objects: 100% (20/20), 67.36 MiB | 421.00 KiB/s, done.
Total 20 (delta 9), reused 0 (delta 0)
remote: Resolving deltas: 100% (9/9), completed with 2 local objects.
remote: warning: File datasets/credit_card_fraud_2.csv is 99.68 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 191376373d7e9d8b1db86eb399ea04f8
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File creditcard.csv is 143.84 MB; this exceeds GitHub's file size limit of 100.00 MB
To https://github.com/Teett/fraud_detection_sura.git
 ! [remote rejected] master -> master (pre-receive hook declined)
error: failed to push some refs to 'https://github.com/Teett/fraud_detection_sura.git'

As you can see, the file name is "creditcard.csv". I search for some help in Google and I ran some of the following commands

  • git rm creditcard.csv: but I get the error fatal: pathspec 'creditcard.csv' did not match any files.
  • git add -u and git add -A: but the push still showing the same error due to file size.
  • git status: it shows that the working tree is clean.

Notice that if I run git ls-files, it shows:

.Rhistory
LICENSE
datasets/credit_card_fraud_1.csv
datasets/credit_card_fraud_2.csv
model_fraud_detection_logistic_1/coefs.csv
model_fraud_detection_logistic_1/credit_card.xls
model_fraud_detection_logistic_1/credit_card_script.Rmd
model_fraud_detection_logistic_1/default_payment_prob.png
model_fraud_detection_logistic_1/default_payment_probability.pdf
model_fraud_detection_logistic_1/default_payment_probability.png
model_fraud_detection_logistic_1/intervalos_confianza.pdf
model_fraud_detection_logistic_1/readme.docx
model_fraud_detection_logistic_2/fraud_detection_credit_card.Rmd

But the file "creditcard.csv" is not listed.

I am looking for a way to update the git registers to keep the tracking files as I have showed when I ran git ls-files command. If necessary I don't want to keep tracking the older versions of files. Or I am looking for a way to ignore this problematic file in the push command.

Lemark
  • 139
  • 3
  • 11
  • 1
    If the file is part of the history, it does not matter if you have since deleted it, git still has a copy for that old commit given that you can check out old commits at your will. If you didn't intend for this file to be added, you can remove it by using various tools. – Lasse V. Karlsen Jun 21 '19 at 14:38
  • 2
    Note that short of never committing a file, you cannot ask git to only keep the most recent file. This will require frequent rewrites of the git history and is going to bite you at some point. If you need to just have handy the most recent version of a file for your project, consider git-ignoring this file altogether and finding a different way of storing and obtaining this file. – Lasse V. Karlsen Jun 21 '19 at 14:39
  • 1
    Now, if you do want to store versions of such large files, there is a system called LFS, large-file-system, but I do not know the limits github operates with using this system. You still need to rewrite the history to move the file out of the normal commit history. – Lasse V. Karlsen Jun 21 '19 at 14:40
  • 1
    You might want to try this to remove the file completely from the history https://help.github.com/en/articles/removing-sensitive-data-from-a-repository – ikkentim Jun 21 '19 at 14:47
  • https://stackoverflow.com/search?q=%5Bgit%5D+remove+large+files – phd Jun 21 '19 at 14:49

0 Answers0