0

My application creates some pdf files using some private assemblies and then send them to another app by copying them in the second app folders.

After a while I need to come back and get my files and update them, the thing is that meanwhile some other pdf files can be added there which are not my files, and i need to get and use only my pdf files.

My question is, how can I easily identify my files? Should I use a specific name for them? Should I create a file in which to store my file names? Or is there something that can sign a file as "my file" (without using a certificate or a third party software)?

Please note that i am using the last versions of C# and .NET .

Matt
  • 4,462
  • 5
  • 25
  • 35
Clock
  • 974
  • 3
  • 17
  • 35
  • You could use extended file properties: http://stackoverflow.com/questions/5337683/how-to-set-extended-file-properties – Ron Beyer Jun 09 '15 at 19:46
  • 1
    I'd prefer to store file name and hash in application db(or file) then just iterate through saved files and match hashes. I think this is more reliably. – Oleg Jun 09 '15 at 19:52
  • Hi Ron, do you somehow know is this solution working on any device that can run .NET ? – Clock Jun 09 '15 at 19:57
  • Hi Oleg, yes... this might be a solution only that at least now... this seems also to be the most time consuming one... and also raises more code to maintain... I had prefer to sign those files... somehow... – Clock Jun 09 '15 at 19:58
  • This is only a problem when the list of your files don't always have the same name. For example, if the end app only expects a "splash.pdf" there is no point in sending an updated file with another name. If not: send a list of updated files, so the end app can update its own list. – Jongware Jun 09 '15 at 20:13
  • I would use the PDF Producer (or Creator) in the Document Information Dictionary inside the PDF to mark your application as the producer if your code is creating the PDF. Then you can parse the PDFs for that value. PDF Specification 1.7, section 10.2.1 Document Information Dictionary. – Kevin Brown Jun 09 '15 at 20:15
  • Hi Jongware, mainly sending a new list with the new names is ok, but for some update operations i need to find the updated pdf file get its name and replace it with the new one in this case i must be sure that i am getting a file that was added by my app. – Clock Jun 09 '15 at 20:47
  • Hi Kevin, is there any example about how to do that... – Clock Jun 09 '15 at 20:48
  • Your post says -- "creates PDF files using some private assemblies" ... what are they? Are they yours or? It would be there that you would implement your own producer -- or -- do they have a specific one? Open the PDF and look at the Properties inside Adobe Reader. What is the producer? Maybe this is already unique for you. – Kevin Brown Jun 09 '15 at 21:31
  • (@Clock: tip: to ensure people get a red popup in their in-box – such as you should see now for this post – and they know you said something back, don't "hi" people but prefix their name with a `@` character.) – Jongware Jun 09 '15 at 22:32

3 Answers3

1

There are actually 3 ways:

  • Write a signature and properties into PDF file and then read back. This requires to use specialized 3rd party PDF library to write, read, add signature to PDF;
  • Maintain a separate XML file with detailed information like last updated, version, CRC or MD5 value to make sure file was not changed. So you have e.g. MyDocument.pdf and MyDocument.pdf.xml and you may search for all .pdf.xml file inside folder and then just change .pdf.xml to .pdf extension to get original pdf filenames. This way won't require any 3rd party components to be implemented. You may also sign XML file to make sure it was not modified.
  • The simplest way is to use filenames to store the additional information and CRC or MD5 hash value (to maintain the integrity). For example, you may name files like "MyApp_15-06-2015-e4d909c290d0fb1ca068ffaddf22cbd0.pdf" and scan for files "MyApp_*.pdf" and then just parse the filename to get date and hash or CRC value. This way can be done with .NET with no 3rd party libraries required too.
Community
  • 1
  • 1
Eugene
  • 2,820
  • 19
  • 24
0

Try something like this:

void Main()
{
    var myfile = "somefile.pdf";
    File.WriteAllBytes(myfile+".myfile","");
}

Then you just look for your "myfiles"

Directory.GetFiles("path","*.myfile")

Basically you are creating a zero byte file that has your original file in it's name. Once you have all the "*.myfile" files back.. you can get to your original by striping the ".myfile" from the name and deleting the 'myfile' version when you are done processing it.

Charles
  • 91
  • 6
  • Hi Charles... this solution sounds like a time consuming one... i would rahter mantain a xml file with the names of my files that using this... – Clock Jun 09 '15 at 20:51
  • @Clock yes you can use XML to store the files, but then this is a more complicated process then just creating a zero byte file and let the file system store your files.(as its already doing) You asked "My question is, how can I easily identify my files?" and I think my solution is is easiest as it's 1 line of code to designate a file as "myfile" and one line of code to retrieve a list of "myfile"(s). XML has a heavier lift to achieve the same results once you calculate in schema / parsing etc.. – Charles Jun 10 '15 at 04:59
0

Set the "Producer" document information property.

gn1
  • 526
  • 2
  • 5
  • Any information on how to do that from C#? – Caramiriel Jun 15 '15 at 10:53
  • That depends on the PDF library with which you are using to create/modify the PDFs. When you press Ctrl+D in Adobe Reader, it shows meta data - title, author, producer, creator, etc. In PDF specifications it is called Document Information Properties. Here is it using my company's product. You might find a similar API in your PDF library. http://www.gnostice.com/nl_article.asp?id=98&t=Reading_and_Writing_PDF_Document_Information_Properties_In_NET – gn1 Jun 16 '15 at 06:10