How to move a file from local server location to hdfs?

Question

I have a file on my server at location

/user/data/abc.csv

I need to create a hive table on top of this data in the file. So i need to move this file to hdfs location

/user/hive/warehouse/xyz.db

How can we do that using python?

You could use PySpark to read your local file, and write to a Hive table. — OneCricketeer, Aug 09 '18 at 22:12
@cricket_007 I want to implement it using pyspark, I plan to create a hive table on top of my file . That's the reason I want to move it from my server location to hdfs location. I can write a shell command -copyFromLocal but I want to do this using python in pyspark. How do I do it? — Aakib, Aug 11 '18 at 00:53
Again, see the first link... Big long list of Python libraries to interact with HDFS. However, `saveAsTable` works fine in PySpark, so I ask - what have you tried? What errors are you getting? — OneCricketeer, Aug 11 '18 at 03:40
Things that I tried subprocess.call(['hdfs', 'dfs', '-copyFromLocal', '/u/data/abc.csv', 'hdfs://user/hive/warehouse/class.db/abc.csv'], shell=True) Error : No alias specified and no default alias found. 1 2nd try: shutil.copy('/user/adam/data//abc.csv', 'hdfs://user/hive/warehouse/class.db/class/abc.csv') — Aakib, Aug 11 '18 at 18:39
'shutil` can't access HDFS paths. The first is correct, assuming `hdfs` command is on your OS `PATH`, but again, you've not tried Spark? — OneCricketeer, Aug 11 '18 at 22:44
@cricket_007 I am writing this code after initiating my pyspark engine. — Aakib, Aug 12 '18 at 01:52
Neither subprocess or shutil use a Spark context... Like I mentioned, you want to use a `saveAsTable` function from Spark — OneCricketeer, Aug 12 '18 at 15:10

score 0 · Accepted Answer · answered Aug 10 '18 at 08:33

First you need to retrieve the file from server. Use this pyhton code to retrieve it to your local machine.

import ftplib

path = '/user/data/'
filename = 'abc.csv'

ftp = ftplib.FTP("Server IP") 
ftp.login("UserName", "Password") 
ftp.cwd(path)
ftp.retrbinary("RETR " + filename ,open(filename, 'wb').write) #Download the file from server to local on same path.
ftp.quit()

Once the file downloaded to local, then do usual hive query to Load data from local or put data into HDFS then load to hive.

Load data directly from local to hive:

LOAD DATA local INPATH '/user/data/abc.csv' into table <table name>;

Load data to HDFS:

hadoop fs -copyFromLocal ~/user/data/abc.csv /your/hdfs/path

then load it to hive by using hive query.

LOAD DATA INPATH '/your/hdfs/path' into table <table name>;

I dont want to write shell commands, I am within pyspark boundaries. So I need a python code to copy my file. — Aakib, Aug 11 '18 at 00:54

score -1 · Answer 2 · answered Aug 13 '18 at 09:45

-1

hadoop fs -put command can be used to put file from local file system to HDFS.

answered Aug 13 '18 at 09:45

Prashant

702
6
21

I think you missed the "using Python" part of the question – OneCricketeer Aug 14 '18 at 13:32

How to move a file from local server location to hdfs?

2 Answers2