0

Is there a program that crawls a specified website and will spit out if there is a reference to another website? I have images,video files,pdf's,etc. that I need to give to another developer to finish the port over to their new server.

I just transferred an old site to another person and they are still using my files. I don't know 100% were all the files are and I want to be sure what files I need to give to them. It would be nice to have something like linkchecker that can crawl and if there is a reference to a website root (ex. sub.domain.com) then it will spit out information about it (what page, what is the url).

I don't want to block the site at this point from using the files so that is out.

I'm on a Mac so any terminal program would be just fine.

Rich
  • 11
  • 1

2 Answers2

1

You could try Sitesucker which can be used to download all the files used on a site (and any it links to depending on the settings). It's OSX (and iPhone) donation-ware so that might be just what you're looking for. I believe it creates a log file of the files it downloads so you could send that if you just want to send the URL's to your colleague instead of the actual files.

caiocpricci2
  • 7,714
  • 10
  • 56
  • 88
James
  • 1,292
  • 1
  • 14
  • 29
0

You could check out wget. It can recursively (-r option) download a website and save its content to your harddisk. It usually (i.e. if not specified otherwise) downloads everything into directories named like the host.

But be careful not to download the whole internet recursively ;) So be sure to specify correct --domains or --exclude-domains options.

bmk
  • 13,849
  • 5
  • 37
  • 46