3

I have a script which connects to database and gets all records which statisfy the query. These record results are files present on a server, so now I have a text file which has all file names in it.

I want a script which would know:

  1. What is the size of each file in the output.txt file?
  2. What is the total size of all the files present in that text file?

Update: I would like to know how can I achieve my task using Perl programming language, any inputs would be highly appreciated.

Note: I do not have any specific language constraint, it could be either Perl or Python scripting language which I can run from the Unix prompt. Currently I am using the bash shell and have sh and py script. How can this be done?

My scripts:

#!/usr/bin/ksh
export ORACLE_HOME=database specific details
export PATH=$ORACLE_HOME/bin:path information
sqlplus database server information<<EOF
SET HEADING OFF
SET ECHO OFF
SET PAGESIZE 0
SET LINESIZE 1000
SPOOL output.txt
select * from my table_name;
SPOOL OFF
EOF

I know du -h would be the command which I should be using but I am not sure how should my script be, I have tried something in python. I am totally new to Python and it's my first time effort.

Here it is:

import os

folderpath='folder_path'
file=open('output file which has all listing of query result','r')

for line in file:
 filename=line.strip()
 filename=filename.replace(' ', '\ ')
 fullpath=folderpath+filename
# print (fullpath)
 os.system('du -h '+fullpath)

File names in the output text file for example are like: 007_009_Bond Is Here_009_Yippie.doc

Any guidance would be highly appreciated.

Update:

  1. How can I move all the files which are present in output.txt file to some other folder location using Perl ?
  2. After doing step1, how can I delete all the files which are present in output.txt file ?

Any suggestions would be highly appreciated.

Rachel
  • 100,387
  • 116
  • 269
  • 365
  • If you have spaces in filename you must quote filename `os.system('du -h "%s"' % fullpath)` – jcubic Sep 19 '10 at 17:34
  • 3
    Down vote normally has an explanation, please provide one so that I can improve question. – Rachel Sep 19 '10 at 18:06
  • @RickF: I am have tried using du command which you suggested but it gives me some number, how can I interpret it, is it kb, mb, gb or by ? Also my os version is very old and so I do not have du -h option, is there a way I can get storage value in MB from du using the command my ($size) = split(' ', `du "$folderpath/$_");` ? – Rachel Sep 22 '10 at 21:00

4 Answers4

1

Eyeballing, you can make YOUR script work this way:

1) Delete the line filename=filename.replace(' ', '\ ') Escaping is more complicated than that, and you should just quote the full path or use a Python library to escape it based on the specific OS;

2) You are probably missing a delimiter between the path and the file name;

3) You need single quotes around the full path in the call to os.system.

This works for me:

#!/usr/bin/python
import os

folderpath='/Users/andrew/bin'
file=open('ft.txt','r')

for line in file:
    filename=line.strip()
    fullpath=folderpath+"/"+filename
    os.system('du -h '+"'"+fullpath+"'")

The file "ft.txt" has file names with no path and the path part is '/Users/andrew/bin'. Some of the files have names that would need to be escaped, but that is taken care of with the single quotes around the file name.

That will run du -h on each file in the .txt file, but does not give you the total. This is fairly easy in Perl or Python.

Here is a Python script (based on yours) to do that:

#!/usr/bin/python
import os

folderpath='/Users/andrew/bin/testdir'
file=open('/Users/andrew/bin/testdir/ft.txt','r')

blocks=0
i=0
template='%d total files in %d blocks using %d KB\n'

for line in file:
    i+=1
    filename=line.strip()
    fullpath=folderpath+"/"+filename
    if(os.path.exists(fullpath)):
        info=os.stat(fullpath)
        blocks+=info.st_blocks
        print `info.st_blocks`+"\t"+fullpath
    else:
        print '"'+fullpath+"'"+" not found"

print `blocks`+"\tTotal"
print " "+template % (i,blocks,blocks*512/1024)

Notice that you do not have to quote or escape the file name this time; Python does it for you. This calculates file sizes using allocation blocks; the same way that du does it. If I run du -ahc against the same files that I have listed in ft.txt I get the same number (well kinda; du reports it as 25M and I get the report as 24324 KB) but it reports the same number of blocks. (Side note: "blocks" are always assumed to be 512 bytes under Unix even though the actual block size on larger disc is always larger.)

Finally, you may want to consider making your script so that it can read a command line group of files rather than hard coding the file and the path in the script. Consider:

#!/usr/bin/python
import os, sys

total_blocks=0
total_files=0
template='%d total files in %d blocks using %d KB\n'

print
for arg in sys.argv[1:]: 
    print "processing: "+arg
    blocks=0
    i=0
    file=open(arg,'r')
    for line in file:
        abspath=os.path.abspath(arg)
        folderpath=os.path.dirname(abspath)
        i+=1
        filename=line.strip()
        fullpath=folderpath+"/"+filename
        if(os.path.exists(fullpath)):
           info=os.stat(fullpath)
           blocks+=info.st_blocks
           print `info.st_blocks`+"\t"+fullpath
        else:
           print '"'+fullpath+"'"+" not found"

    print "\t"+template % (i,blocks,blocks*512/1024)
    total_blocks+=blocks
    total_files+=i

print template % (total_files,total_blocks,total_blocks*512/1024)

You can then execute the script (after chmod +x [script_name].py) by ./script.py ft.txt and it will then use the path to the command line file as the assumed path to the files "ft.txt". You can process multiple files as well.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • I tried your approach, when I try to add files then I get values like `315904L`, not sure what `L` stands for ? Also, if I run first script then it gives me size as `86K` and `259K`, total of which if I do on Calc gives me `345K` and so not sure but we are getting different numbers on summation of two numbers in different ways, any thoughts on this ? – Rachel Sep 20 '10 at 14:55
  • Because your files are really big, huh? Let me change the script to use blocks instead of byte the way du does.... – dawg Sep 20 '10 at 15:46
  • I do not see any changes in the script, what do we mean by blocks here ? – Rachel Sep 20 '10 at 15:58
  • Suggestion: I am confused whose answer should I accept, RickF's or drewk's as both have solved my problem, any suggestions ? – Rachel Sep 20 '10 at 21:33
1

In perl, the -s filetest operator is probaby what you want.

use strict;
use warnings;
use File::Copy;

my $folderpath = 'the_path';
my $destination = 'path/to/destination/directory';
open my $IN, '<', 'path/to/infile';
my $total;
while (<$IN>) {
    chomp;
    my $size = -s "$folderpath/$_";
    print "$_ => $size\n";
    $total += $size;
    move("$folderpath/$_", "$destination/$_") or die "Error when moving: $!";
}
print "Total => $total\n";

Note that -s gives size in bytes not blocks like du.

On further investigation, perl's -s is equivalent to du -b. You should probably read the man pages on your specific du to make sure that you are actually measuring what you intend to measure.

If you really want the du values, change the assignment to $size above to:

my ($size) = split(' ', `du "$folderpath/$_"`);
RickF
  • 1,812
  • 13
  • 13
  • @RickF: Is this giving size of each and every file in the folder plus total size of the file in the folder ? – Rachel Sep 20 '10 at 19:18
  • @RickF: Is there a way I can get blocks size instead of byte size which I get using `du` ? – Rachel Sep 20 '10 at 19:36
  • On my Linux box, the block size is 1024 bytes (1kb), so you would just divide `$size` by that. – RickF Sep 20 '10 at 20:21
  • Actually, on further testing, `du -h` returns a minimum of '4.0k' for any file on my system, even where `ls` or perl `-s` shows a size smaller than that. – RickF Sep 20 '10 at 20:23
  • @RickF: So would it be wise to say that `perl -s` gives more accurate statistics as compared to `du -h` – Rachel Sep 20 '10 at 21:08
  • 1
    Actually, `du` always uses an assumption of 512 bytes for a block even if your block size is different. Part of the Single UNIX Specification http://en.wikipedia.org/wiki/Du_(Unix) – dawg Sep 20 '10 at 21:24
  • Suggestion: I am confused whose answer should I accept, RickF's or drewk's as both have solved my problem, any suggestions ? – Rachel Sep 20 '10 at 21:33
  • `du` will result in `KB` or `bytes` ? – Rachel Sep 21 '10 at 01:32
  • @RickF: Doing modification you suggested, Total gives me number which is way less than what I use to get without `du` which leads me to thinking that `du` is somehow giving me in `kb`, i know this can be wrong and please do correct me if am having wrong understanding of it. – Rachel Sep 21 '10 at 01:49
  • @Rachel: if you use the `-h` argument to `du` the magnitude of the output is automatically from bytes to KB, MB, TB, etc to maintain about 2 digits. So a file that is 300 bytes will report that; 3,890,000 byte file will report as 3.7M because each order of mag is 1024. Even still -- `du` is off on small files because an assumption of fixed 512 byte blocks is made. – dawg Sep 21 '10 at 03:14
  • @drewk: The Single Unix Spec is not how my Ubuntu box is configured by default. My default block size is 4kb. You can check yours with `tune2fs -l /dev/sda1 | grep Block` – RickF Sep 21 '10 at 12:37
  • @Rachel: `du` response will depend on your system specifics. On my Ubuntu system, plain `du` returns kb rounded up to the nearest 4k. `du -h` returns the same value converted to KB/MB/TB as appropriate. `du -b` returns the file size in bytes ignoring the 4k block size. – RickF Sep 21 '10 at 12:44
  • @RickF: Now how can I move all the files present in outlog.txt to another location(some folder) and then delete the files which are in current location ? – Rachel Sep 21 '10 at 14:04
  • @RickF: How would script look like in case we need to move all files in output.txt file to some other location and then delete files present in output.txt, I have updated question for reference. – Rachel Sep 21 '10 at 14:08
  • @RickF: I have filename with special characters in file and I want to move that file but am not able to move files having special characters, is there a way in perl to handle with special characters, also I can not even write the file name here as encoded value appears for the file name, is there a way out for the situation ? – Rachel Sep 21 '10 at 15:37
  • Do the special characters show up in the printed output file? Are they Unicode? If so, http://perldoc.perl.org/perlunitut.html is a good place to start. Also, is there an actual error message? – RickF Sep 21 '10 at 15:56
  • @RickF: It is not that your block size is actually 512 B, it is that some Unix utilities reports block use as if it were assumed to be 512 B unless the env variable BLOCKSIZE has been set. type `echo $BLOCKSIZE` to check Most are not set. Try `du -ac` or `ls -s` on a file that size > 512 bytes but much smaller than 4kb then `du -ach` on the same file. A 621 B file is reported as 4.0K (correct given block size) and 8 blocks (incorrect with 4kb blocks but it assumes 512 B blocks). type `man ls` read about `-s`. If $BLOCKSIZE eq "", $blocks=$blocks_reported*512/$block_size – dawg Sep 21 '10 at 16:32
  • @drewk: On Ubuntu 9.10, with a 648b file, `du -ac` & `ls -s` return "4". `du -ach` returns "4.0K". `ls -l` and `du -b` return "648". BLOCKSIZE env is not set. If you check the man page for GNU `du`, linked from the Wikipedia page you linked above, it says "Normally the disk space is printed in units of 1024 bytes, but this can be overridden (see Block size)." My point is that you can't safely assume `du` output is in 512b blocks. – RickF Sep 21 '10 at 17:24
  • @RickF: I was just quoting my man page on 512 vs 1024. If you follow the BSD link, it states 512. My point is that file size based on blocks need to be adjusted. You stated that your block size from `tune2fs -l /dev/sda1 | grep Block` is 4k. If the file actually was using 4 blocks, it would be using 4 blocks * 4096 block size=16,384 bytes. Your 648B file fits in 1 block, so uses 4,096 bytes. You are correct that it is not safe to assume 512B assumption, but neither is it accurate to add up all the byte in a series of file for total disk use. It needs to be adjusted for actual block use. – dawg Sep 21 '10 at 17:57
  • @drewk: my `tune2fs` output led me to the conclusion that my `du` is not giving a count of blocks used, but the count of disk used in kb. Block size is 4k, `du` output for a tiny file is '4' or '4.0k', not '1'. GNU `du` defaults to 1024b block size, but seems aware of the system block size. – RickF Sep 21 '10 at 19:04
  • @RickF: Well that is definitely not what the documents say. I believe your conclusion is incorrect. The '4' you are getting from `du` just happens to be interchangeable in this case because the assumed block size is the same as kilobytes, but that will not always be true... – dawg Sep 22 '10 at 05:35
  • @drewk: What OS are you using? BSD systems may not be using the GNU version of `du`. I'm quite sure that my conclusions are correct for *my* situation: Ubuntu 9.10 w/ GNU `du`. Block size is a function of the file system, and Ext3 defaults to 4kb blocks. – RickF Sep 22 '10 at 13:38
  • @RickF: I have both OS X and Ubuntu 10.04. On Ubuntu, there is a feature of `du` that shows what I am talking about. On a file where the size > block size and size << 2 blocks, run these commands: `du [file]; du --block-size=1024 [file]; du --block-size=512 [file]; du --block-size=1 [file]` What do you conclude? – dawg Sep 22 '10 at 18:45
  • @RickF: I am have tried using `du` command which you suggested but it gives me some number, how can I interpret it, is it `kb, mb, gb or by` ? Also my os version is very old and so I do not have `du -h` option, is there a way I can get storage value in `MB` from du using the command `my ($size) = split(' ', `du "$folderpath/$_"`);` ? – Rachel Sep 22 '10 at 21:00
  • @drewk: On a file of 5655b (according to ls -l), `du` and `du --blocksize=1024` give "8", as expected since the docs say `du` defaults to blocksize=1024. 512 => 8, 1 => 8192. This all seems consistent with measuring disk usage (4k blocks). Do your results differ? http://www.gnu.org/software/coreutils/manual/html_node/du-invocation.html – RickF Sep 23 '10 at 14:29
  • @Rachel: It depends on your OS version - what are you running? As I've been discussing here with drewk, it depends. You'll need to figure out what your specific `du` is returning. Or specify it - `du --blocksize=1024 "$folderpath/$_"` and you'll know that it's in kbytes. blocksize is in bytes. – RickF Sep 23 '10 at 14:35
  • @RickF: Am getting `/usr/bin/du: illegal option -- - usage: du [-a][-d][-k][-r][-o|-s][-L] [file ...] ` error message, also am using `OS Information uname :SunOS uname -v :Generic_117350-39` – Rachel Sep 23 '10 at 15:13
  • @RickF: yes, on Ubuntu, I get the same. I was understanding though that you said "du is not giving a count of blocks used, but a count of disk used in kb" This is what I was saying is incorrect. du gives blocks unless you use a switch to get a human readable form. – dawg Sep 23 '10 at 16:53
  • @Rachel: According to http://docs.sun.com/app/docs/doc/816-0210/6m6nb7m7t?a=view your version of `du` reports the count in 512b blocks by default. It doesn't support `--blocksize=X` but it does support `-k` which will give output in kb rather than blocks. Or you can take the default output and divide it by 2 (in the script) to get kb. 2 blocks = 1 kb. – RickF Sep 23 '10 at 16:56
  • @Rachael: just use Perl to get block size and blocks used in a call to stat. You will not have any shell issues. brian d foy gave you the code. – dawg Sep 23 '10 at 16:56
0

You can do it in your shell script itself.

You have all the files names in your spooled file output.txt, all you have to add at the end of existing script is:

< output.txt  du -h

It will give size of each file and also a total at the end.

codaddict
  • 445,704
  • 82
  • 492
  • 529
  • @ codaddict: I did not understood the working on `<` ahead of the output.txt du -h command, can you explain more on this ? – Rachel Sep 19 '10 at 16:55
  • Its same as `du -h < output.txt` – codaddict Sep 19 '10 at 16:56
  • where should I add this, beside spool command in first script ? – Rachel Sep 19 '10 at 16:56
  • But from the sql am just getting the file name but not the actual folder location where the file is, that I am passing in the python script and so not sure how it would work ? – Rachel Sep 19 '10 at 16:59
  • This doesn't work - `du` does not take piped arguments. There may be a way to make it work, but as written, it's non-functional. – RickF Sep 20 '10 at 20:59
0

You can use the Python skeleton that you've sketched out and add os.path.getsize(fullpath) to get the size of individual file.

For example, if you wanted a dictionary with the file name and size you could:

dict((f, os.path.getsize(f)) for f in file)

Keep in mind that the result from os.path.getsize(...) is in bytes so you'll have to convert it to get other units if you want.

In general os.path is a key module for manipulating files and paths.

dtlussier
  • 3,018
  • 2
  • 26
  • 22
  • I have added the script in update above question but still am getting standard errors. – Rachel Sep 19 '10 at 17:15
  • I have updated questions with the latest script according to your suggestions and the issues what am facing with it. – Rachel Sep 19 '10 at 17:18
  • I have updated answer for your response and the error which am getting using that approach. – Rachel Sep 19 '10 at 17:23
  • The list comprehension is superfluous in this case, a simple generator expression does fine and is more efficient. Just leave the square brackets. –  Sep 19 '10 at 17:37
  • Can you elaborate on this, also I have update my questions with specific response to this answer, can you share your comments on it. – Rachel Sep 19 '10 at 17:39
  • @lunaryon - thanks. I've updated the response w/o the square brackets. – dtlussier Sep 19 '10 at 21:31
  • @Rachel - I don't see any updates in the question, are you still getting errors using `os.path.getsize(...)`? – dtlussier Sep 19 '10 at 21:33