I have a folder with around 150 Word and PDF (same text) documents. Data is here: http://www.sicgen.pt/antigen_folder/data_sheet/AB0003_ERP57_AB_data_sheet2003.pdf
Text is always like (after loading with pdftools):
library(pdftools)
u <- pdf_text("AB0003_ERP57_AB_data_sheet200.pdf")
[1] " Product Data Sheet\r\n 001 Rev1 Jan 2012 by JR\r\nCatalogue No. AB0003-200\r\nQty: 400 µg (2 mg/ml)\r\n ERp57 Polyclonal Antibody\r\nSource: Goat phospholipase C alpha, PI PLC, protein disulfide\r\n isomerase A3 antibody.\r\nGeneral description: Goat polyclonal to ERp57 -\r\nendoplasmic reticulum lumen marker. This Form: Polyclonal antibody supplied as a 200 µl\r\nendoplasmic reticulum protein interacts with lectin (2 mg/ml) aliquot in PBS, 20% glycerol and 0.05%\r\nchaperones calreticulin and calnexin to modulate sodium azide. This antibody is epitope-affinity\r\nfolding of newly synthesized glycoproteins. It has purified from goat antiserum.\r\ndisulfide isomerase activity and complexes of\r\nlectins and this protein mediate protein folding by Immunogen: Recombinant peptide derived from\r\npromoting formation of disulfide bonds in their within residues 300 aa to the C-terminus of human\r\nglycoprotein substrates. ERp57 produced in E. coli.\r\nAlternative names: 58 kDa glucose regulated Specificity: Detects a band of 60 kDa by Western\r\nprotein, 58 kDa microsomal protein, disulfide blot in the following canine, human, monkey,\r\nisomerase ER 60, endoplasmic reticulum resident mouse, rat whole cell lysates.\r\nprotein 57, endoplasmic reticulum resident protein\r\n60, ER protein 57, ER protein 60, ER protein 61,\r\nERP57, ERp60, ERp61, glucose regulated protein\r\n58 Kd, GRP57, GRP58, HsT17083, P58, PDIA3,\r\nReactivity: Reacts against human, rat, mouse, canine and monkey proteins.\r\nSample Western blot Immuno- Histochemistry (paraffin) Histochemistry (frozen)\r\n fluorescence\r\nhuman +++ +++ +++ +++\r\nrat +++ +++ +++ +++\r\nmouse +++ +++ +++ +++\r\ncanine +++ +++ +++ +++\r\nmonkey +++ +++ +++ +++\r\n+++ excellent, ++ good, + poor, ND not determined\r\nUsage: Western blot 1:500-1:2,000 Storage: Store at -20 C for long-term storage. Store\r\nImmunofluorescence 1:50-1:500 at 2-8 C for up to one month.\r\nImmunohistochemistry (paraffin) 1:200-1:1,000\r\nImmunohistochemistry (frozen) 1:200-1:1,000 Special instructions: Avoid freeze/thaw cycles.\r\nSICGEN - Research and Development in Biotechnology Ltd\r\nEstrada do Pombalinho, Rabaçal, 3230-544 PENELA – PORTUGAL\r\nwww.sicgen.pt information@sicgen.pt\r\n"
[2] " Product Data Sheet\r\n 001 Rev1 Jan 2012 by JR\r\nReferences:\r\n For research use only, not for diagnostic use\r\nSICGEN's Proprietary Immunogen Policy\r\nIn order to produce high specific antibodies SICGEN has invested a lot of time and effort into selecting immunogen\r\nsequences. SICGEN has decided to protect this information by not publishing it on the website. However, these sequences\r\nare available on request.\r\nSICGEN - Research and Development in Biotechnology Ltd\r\nEstrada do Pombalinho, Rabaçal, 3230-544 PENELA – PORTUGAL\r\nwww.sicgen.pt information@sicgen.pt\r\n"
I wish to transform into a dataframe or table in either R or excell.
Catalogue.No. Name Source.
1 AB0003-200 ERp57 Goat
2 AB0004-500 (...) (...)
General.Description
1 Goat polyclonal to ERp57 - endoplasmic reticulum lumen marker. This endoplasmic reticulum protein interacts (...)
2 (...)
Alternative.names.
1 58 kDa glucose regulated protein, (...)
2 (...)
Form.
1 Polyclonal antibody supplied as a 200 µl (2 mg/ml) aliquot in PBS
2 (...)
Immunogen
1 Recombinant peptide derived from within residues 300 aa (...)
2 (...)
Specificity. Reactivity.
1 Detects a band of 60 kDa by(...) Reacts against human, rat, ...
2 (...) (...)
Usage.
1 Western blot 1:500-1:2,000 Immunofluorescence
2 (...)
I want to format it into table format. Here is the import from a PDF file.
textImport <- pdf_text("AB0003_ERP57_AB_data_sheet200.pdf")
[1] " Product Data Sheet\r\n 001 Rev1 Jan 2012 by JR\r\nCatalogue No. AB0003-200\r\nQty: 400 µg (2 mg/ml)\r\n ERp57 Polyclonal Antibody\r\nSource: Goat phospholipase C alpha, PI PLC, protein disulfide\r\n isomerase A3 antibody.\r\nGeneral description: Goat polyclonal to ERp57 -\r\nendoplasmic reticulum lumen marker. This Form: Polyclonal antibody supplied as a 200 µl\r\nendoplasmic reticulum protein interacts with lectin (2 mg/ml) aliquot in PBS, 20% glycerol and 0.05%\r\nchaperones calreticulin and calnexin to modulate sodium azide. This antibody is epitope-affinity\r\nfolding of newly synthesized glycoproteins. It has purified from goat antiserum.\r\ndisulfide isomerase activity and complexes of\r\nlectins and this protein mediate protein folding by Immunogen: Recombinant peptide derived from\r\npromoting formation of disulfide bonds in their within residues 300 aa to the C-terminus of human\r\nglycoprotein substrates. ERp57 produced in E. coli.\r\nAlternative names: 58 kDa glucose regulated Specificity: Detects a band of 60 kDa by Western\r\nprotein, 58 kDa microsomal protein, disulfide blot in the following canine, human, monkey,\r\nisomerase ER 60, endoplasmic reticulum resident mouse, rat whole cell lysates.\r\nprotein 57, endoplasmic reticulum resident protein\r\n60, ER protein 57, ER protein 60, ER protein 61,\r\nERP57, ERp60, ERp61, glucose regulated protein\r\n58 Kd, GRP57, GRP58, HsT17083, P58, PDIA3,\r\nReactivity: Reacts against human, rat, mouse, canine and monkey proteins.\r\nSample Western blot Immuno- Histochemistry (paraffin) Histochemistry (frozen)\r\n fluorescence\r\nhuman +++ +++ +++ +++\r\nrat +++ +++ +++ +++\r\nmouse +++ +++ +++ +++\r\ncanine +++ +++ +++ +++\r\nmonkey +++ +++ +++ +++\r\n+++ excellent, ++ good, + poor, ND not determined\r\nUsage: Western blot 1:500-1:2,000 Storage: Store at -20 C for long-term storage. Store\r\nImmunofluorescence 1:50-1:500 at 2-8 C for up to one month.\r\nImmunohistochemistry (paraffin) 1:200-1:1,000\r\nImmunohistochemistry (frozen) 1:200-1:1,000 Special instructions: Avoid freeze/thaw cycles.\r\nSICGEN - Research and Development in Biotechnology Ltd\r\nEstrada do Pombalinho, Rabaçal, 3230-544 PENELA – PORTUGAL\r\nwww.sicgen.pt information@sicgen.pt\r\n"
[2] " Product Data Sheet\r\n 001 Rev1 Jan 2012 by JR\r\nReferences:\r\n For research use only, not for diagnostic use\r\n
If you have any suggestion please let me know.