0

for my university project i have to collect some data from github using the API. I save the result of my api call into a json file and after that i have to convert the json file into a csv file.

i use the following code to conver the json file to a csv:

with open ("data.json", "r") as f:
  data = json.load(f)

with open('data.csv', 'w') as f:
  fieldnames = data[0].keys()
  writer = csv.DictWriter(f, fieldnames=fieldnames)
  writer.writeheader()
  for res in range(len(data)):
    writer.writerow(data[res])

My problem is that in the json file i have some key/value pair as i follow:

"title" : "Hello \n World"

The "\n" is taken as newline i think because it will split the row of my csv file. How solve this problem? Anyway to make my code to ignore the "\n"?

bad output

output that i want

zummino
  • 1
  • 1
  • 1
    Here are some tips on adding escape to backslash special characters: https://stackoverflow.com/questions/18935754/how-to-escape-special-characters-of-a-string-with-single-backslashes – RufusVS Mar 23 '21 at 19:43
  • The `csv` module should handle this correctly. What is the *output you are looking for*? What is the problem? – juanpa.arrivillaga Mar 23 '21 at 19:44
  • Did you actually try it and examine your output to see the exact behavior? – RufusVS Mar 23 '21 at 19:45
  • My code to convert to JSON to CSV works fine if I don't have "\ n" in the whole json file. The problem is that for example if I have a json file with 2 objects, the csv file must be like 3 lines in total (1 for the header and 2 for the object). If in the json file I have the "\ n" this will divide the line in the csv file at the "\ n" thus obtaining in output a number of lines in the csv that does not represent the number of objects, because the object that contains "\ n "will be divided into two lines – zummino Mar 23 '21 at 20:24
  • I added two example images to show you the output I get and the output I would like to get – zummino Mar 23 '21 at 20:34

2 Answers2

1

Did you check the string.replace() method like mystring.replace('\n', ' ')?

Jörg
  • 15
  • 1
  • 7
  • it could be a solution, but I'm looking for something that doesn't force me to use a for loop and iterate through all my json objects to do the replace because in my final project I will have to work with millions of objects and it could be expensive to do a for loop – zummino Mar 23 '21 at 20:14
0

pandas can handle this:

import pandas as pd

df = pd.read_json('data.json')
df.to_csv('data.csv')

Or since you are opening the file in Excel you could write to xlsx directly:

df.to_excel('data.xlsx')

If you still wish to remove the newlines you can use any of these solutions prior to saving the dataframe.

RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26