0

I want to know how can we convert .xlsx file residing in hdfs to .csv file using R script.

I tried using XLConnect and xlsx packages, but its giving me error 'file not found'.I am providing HDFS location as input in the R script using the above packages.I am able to read .csv files from hdfs using R script (read.csv()).

Do I need to install any new packages for reading .xlsx present in hdfs .

sharing the code i used:

library(XLConnect)

d1=readWorksheetFromFile(file='hadoop fs -cat hdfs://............../filename.xlsx', sheet=1)

"Error: FileNotFoundException (Java): File 'filename.xlsx' could not be found - you may specify to automatically create the file if not existing."

I am sure the file is present in the specified location.

Hope my question is clear. Please suggest a method to resolve it.

Thanks in Advance!

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
Malu
  • 1
  • 1
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269) . This will make it much easier for others to help you. – Jaap Nov 24 '16 at 09:15
  • The error is clear. You are not referring to the file in the correct manner. – Roman Luštrik Nov 24 '16 at 09:59

1 Answers1

0

hadoop fs isn't a file, but a command that copies a file from HDFS to your local filesystem. Run this command from outside R (or from inside it using system), and then open the spreadsheet.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187