-1

I found this implementation in JAVA, but I was wondering if it is possible to get the number of slides in a ppt file? If so, would it be similar to doing the same operation in pptx files?

-Look through the directory that the script file is in -Detect and count the number of slides in a ppt file -Take that number and append it to a CSV file

I found a bash script that will do something similar but for PDF files

#!/bin/bash 
saveIFS=$IFS
IFS=$(echo -en "\n\b")

myFiles=($(find . -name "*.pdf"))
totalPages=0

echo "file path, number of pages" > log_3.csv
for eachFile in ${myFiles[*]}; do
  pageCount=$(mdls $eachFile | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}')
  size=${#pageCount}

  if [ $size -eq 0 ]
  then
    # these files had no entry for kMDItemNumberOfPages
    # comment out the next line to not list these files
    echo $eachFile : \*\* Skipped - no page count \*\*
  else
    # comment out the next line if you don't want to see a count for each file
    echo $eachFile, $pageCount >> log_3.csv
    totalPages=$(($totalPages + $pageCount))

  fi
done

echo "Total number of pages, ${totalPages}" >> log_3.csv
echo Total pages: $totalPages

IFS=$saveIFS

Could we refractor this code to make it work with ppt files?

Thanks!

  • 1
    Both the example in your [first link](https://stackoverflow.com/questions/22990348/get-word-document-count-and-number-of-slides-count-in-ppt) and in the sample code you provide use an external tool to process the PPT or PDF file. The first answer in your first link would appropriate here, but wrapping the _tika-app-1.5.jar_ call with bash instead of PHP. – Beggarman Oct 17 '18 at 19:31

2 Answers2

0

Let me answer half of your question.
Regarding the pptx files, you can get number of slides with:

#!/bin/bash

function pagecount() {
    local pptx=$1
    local pagecount line
    while read -r line || [[ -n "$line" ]]; do
        if [[ "$line" =~ \<Slides\>([0-9]+)\</Slides\> ]]; then
            pagecount="${BASH_REMATCH[1]}"
        fi
    done < <(unzip -j -p "$pptx" "docProps/app.xml")
    echo "$pagecount"
}

for file in *.pptx; do
    count=$(pagecount "$file")
    echo "${file} : ${count} pages"
done

As with other MS Office 2007+ files (docx, xlsx, ...), pptx file format is just a zip-compressed XML files. You can find the slide count in the docProps/app.xml file in the form of <Slides>n</Slides>.
The code above works to uncompress docProps/app.xml to stdout then parse it for the Slides property.

Regarding ppt files, the file format is totally different from that of pptx and you may need to introduce some external tool(s) (wvWare or something like that) to process it.

tshiono
  • 21,248
  • 2
  • 14
  • 22
0

Here is my script that counts pages in all .pptx files in a directory, based on tshiono's answer.

#!/bin/bash

function pagecount() {
  for file in *.pptx; do
    count=$(unzip -j -p "$file" "docProps/app.xml" |  grep -o -P '(?<=\<Slides\>).*(?=\</Slides\>)')
    echo "${file} : ${count} pages"
  done
}
Zaydme
  • 3
  • 1
  • 3