1

I imported a data-set from github (json) which is a folder that contains many sub-folders, under sub-folders there are numbers of document files but now I have downloaded the data-set to my local drive and I don't know how to import the data-set folder from my local drive. I do have knowledge about importing csv file using pandas but since my data-set is a folder like I have mentioned above. Could somebody please tell how to import it from my local drive without compromising the following code. Of course I am working with python. Please check the code which shows the dataset being imported from github. And '20_newsgroup' is the name of the folder in my local drive.

# Import Dataset
df = pd.read_json('https://raw.githubusercontent.com/selva86/datasets/master/newsgroups.json')
df = df.loc[df.target_names.isin(['soc.religion.christian', 'rec.sport.hockey', 'talk.politics.mideast', 'rec.motorcycles']) , :]
print(df.shape)  #> (2361, 3)
df.head()

# Convert to list
data = df.content.values.tolist()
data_words = list(sent_to_words(data))
print(data_words[:1])

2 Answers2

0
df = pd.read_json('newsgroups.json')

should suffice. (Or pd.read_json('some/directory/newsgroups.json') if it's not in the current directory.)

J_H
  • 17,926
  • 4
  • 24
  • 44
  • Thanks a lot. Actually my case is that I downloaded the data-set 'newsgroups' from UCI data-set site which is a folder like any other folder and I want to import that folder but your method also work just fine. I downloaded the json file from the github website and it works. THANKS – Kenneth Flank Mar 27 '19 at 04:53
0

In terms of uploading multiple files from a directory, I would see if this answers your question: https://stackoverflow.com/a/30540662/9524722

Keenan
  • 399
  • 3
  • 6