Identify rgb and cmyk color from pdf

Question

I have a PDF that consists of different color text and background color. How do I identify which colors are used in the PDF with CMYK or RGB format?

StringBuilder sb_Sourcepdf = new StringBuilder();
PdfReader reader_FirstPdf = new PdfReader(pdf_of_FirstFile);

Document document = new Document();

PDFParser parser = new PDFParser(new FileInputStream(pdf_of_FirstFile));
parser.parse();
PDDocument docum = parser.getPDDocument();

PDFStreamEngine engine = new PDFStreamEngine();

PDPage page = (PDPage)docum.getDocumentCatalog().getAllPages().get(0);

engine.processStream(page, page.findResources(), page.getContents().getStream());
PDGraphicsState graphicState = engine.getGraphicsState();
string colorname = graphicState.getStrokingColor().getColorSpace().getName();
graphicState.getTextState().getFont();
int r = graphicState.getNonStrokingColor().getJavaColor().getRed();
int g = graphicState.getNonStrokingColor().getJavaColor().getGreen();
int b = graphicState.getNonStrokingColor().getJavaColor().getBlue();
int rgb = graphicState.getNonStrokingColor().getJavaColor().getRGB();
float[] cosp = graphicState.getNonStrokingColor().getColorSpaceValue();
PDColorSpace pd = graphicState.getNonStrokingColor().getColorSpace();

string re = graphicState.getStrokingColor().toString();
int rgbcolor = graphicState.getStrokingColor().getJavaColor().getRGB();

float[] components = { java.awt.Color.black.getRed(), java.awt.Color.black.getGreen(), java.awt.Color.black.getBlue() };

float[] colorSpaceValues = graphicState.getStrokingColor().getColorSpaceValue();


foreach (float c in colorSpaceValues)
{
    Debug.WriteLine(c * 255.00);
}

I used pdfbox but I am getting value as 0.0

If you're using PDFBox, why are you tagging the question as 'itextsharp'? — Bruno Lowagie, Apr 18 '13 at 09:11
@ Bruno Lowagie I wanted to know is it possible to do in itextsharp. because i am not able to get value using pdfbox. for text extraction of PDF i have used itextsharp — Pragya, Apr 18 '13 at 09:22
@Pragya Currently the parser package of iText does ignore text colors. It is moderately easy to extend it to also provide the coloring information. That being said, your PDFBox code seems to inspect the graphics state only at the start or end of the page description (I don't know which state `engine` is in after `engine.processStream` has been called) while you need the state of the moment when the text you want to inspect was rendered. Furthermore you have to take the text render mode into account to see whether stroking color, non-stroking color, both, or neither apply. — mkl, Apr 19 '13 at 06:40
@Pragya Do you mean using PDFBox? I'm sure that after processing the stream PDFBox allows you to iterate the individual elements of it (among them the text strings) and query the graphics state valid when those elements are printed. Or do you mean iText? Of course you are not bound to use the iText parser package as base; it does already do all the heavy lifting, though, so I don't know why you would not want to use it. — mkl, Apr 22 '13 at 07:24

score 1 · Accepted Answer · answered May 02 '13 at 06:21

 PdfReader reader_FirstPdf = new PdfReader(pdf_of_FirstFile);


            for (int i = 1; i <= reader_FirstPdf.NumberOfPages; i++)
            {
 TextWithFont_SourcePdf Sourcepdf = new TextWithFont_SourcePdf();
}
                text_First_File = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader_FirstPdf, i, Sourcepdf);


            public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo)
            {
 int r = renderInfo.GetColorNonStroke().R;
                  int g = renderInfo.GetColorNonStroke().G;
                   int b = renderInfo.GetColorNonStroke().B;

}

Identify rgb and cmyk color from pdf

1 Answers1

Linked