0

I would like to find out the latest files from hdfs directory and keep them as it is and delete older files.

I have 4 files in hdfs directory /user/hive/warehouse/test :

-rwxrwx--x+  3 hive hive          9 2018-11-13 04:13 /user/hive/warehouse/test/bc4151c16c98d191-72314e2e00000000_640731000_data.0.
-rwxrwx--x+  3 hive hive          9 2018-11-13 04:35 /user/hive/warehouse/test/bc4151c16c98d191-72314e2e00000000_640731001_data.0.
-rwxrwx--x+  3 hive hive         12 2018-11-13 08:31 /user/hive/warehouse/test/944adb43a3a5f955-659ed0e100000000_916442110_data.0.
-rwxrwx--x+  3 hive hive         12 2018-11-13 08:31 /user/hive/warehouse/test/944adb43a3a5f955-659ed0e100000000_916442111_data.0.

I want to delete all files which are not latest.

That means my directory should contain the files with timestamp 2018-11-13 08:31

I can sort those files using hdfs dfs -ls /user/hive/warehouse/test | sort -k6,7

How to delete older files? hdfs commands do not have the command like find which would extract only the latest files.

James Z
  • 12,209
  • 10
  • 24
  • 44
Cast_A_Way
  • 472
  • 5
  • 19
  • What have you tried? With what do you have problem? What does not work? If you search for someone to do the job for you, try freelancing sites. You may read [how to ask a good quesiton](https://stackoverflow.com/help/how-to-ask). `there can be multiple files with latest timestamps` With what resolution? With minutes resolution? Also, you may intereset yourself in [this thread](https://stackoverflow.com/questions/34688792/get-the-last-updated-file-in-hdfs). – KamilCuk Nov 13 '18 at 14:30
  • Let's consider I have 5 files with timestamp as "2018-11-13 08:31". I want to keep them only and delete rest files. Link you have referred has used head command. In my case, I cannot use head command. – Cast_A_Way Nov 13 '18 at 14:35
  • Possible duplicate of [How to recursively find and list the latest modified files in a directory with subdirectories and times?](https://stackoverflow.com/questions/5566310/how-to-recursively-find-and-list-the-latest-modified-files-in-a-directory-with-s) – Vin Nov 13 '18 at 14:35
  • 1
    I am using hdfs commands. Find command is not available in hdfs commands. – Cast_A_Way Nov 13 '18 at 14:39

0 Answers0