1

I have the data in CSV format, for example:

enter image description here

  • First row is column number, let's ignore that.
  • Second row, starting at Col_4, are number of days
  • Third row on: Col_1 and Col_2 are coordinates (lon, lat), Col_3 is a statistical value, Col_4 onwards are measurements.

As you can see, this format is a confusing mess. I would like to convert this to JSON the following way, for example:

{"points":{
        "dates": ["20190103", "20190205"],
        "0":{
          "lon": "-8.072557",
          "lat": "41.13702",
          "measurements": ["-0.191792","-10.543130"],
          "val": "-1"
        },
        "1":{
          "lon": "-8.075557",
          "lat": "41.15702",
          "measurements": ["-1.191792","-2.543130"],
          "val": "-9"
        }
    }
}

To summarise what I've done till now, I read the CSV to a Pandas DataFrame:

df = pandas.read_csv("sample.csv")

I can extract the dates into a Numpy Array with:

dates = df.iloc[0][3:].to_numpy()

I can extract measurements for all points with:

measurements_all = df.iloc[1:,3:].to_numpy() 

And the lon and lat and val, respectively, with:

lon_all = df.iloc[1:,0:1].to_numpy()
lat_all = df.iloc[1:,1:2].to_numpy()
val_all = df.iloc[1:,2:3].to_numpy()

Can anyone explain how I can format this info a structure identical to the .json example?

Steffan
  • 536
  • 5
  • 15

3 Answers3

1

With this dataframe

df = pd.DataFrame([{"col1": 1, "col2": 2, "col3": 3, "col4":3, "col5":5},
  {"col1": None, "col2": None, "col3": None, "col4":20190103, "col5":20190205},
  {"col1": -8.072557, "col2": 41.13702, "col3": -1, "col4":-0.191792, "col5":-10.543130},
  {"col1": -8.072557, "col2": 41.15702, "col3": -9, "col4":-0.191792, "col5":-2.543130}])

This code does what you want, although it is not converting anything to strings as in you example. But you should be able to easily do it, if necessary.

# generate dict with dates
final_dict = {"dates": [list(df["col4"])[1],list(df["col5"])[1]]}

# iterate over relevant rows and generate dicts
for i in range(2,len(df)):
  final_dict[i-2] = {"lon": df["col1"][i],
                    "lat": df["col2"][i],
                    "measurements": [df[cname][i] for cname in ["col4", "col5"]],
                    "val": df["col3"][i]
                    }

this leads to this output:

{0: {'lat': 41.13702,
  'lon': -8.072557,
  'measurements': [-0.191792, -10.54313],
  'val': -1.0},
 1: {'lat': 41.15702,
  'lon': -8.072557,
  'measurements': [-0.191792, -2.54313],
  'val': -9.0},
'dates': [20190103.0, 20190205.0]}
mailach
  • 71
  • 5
1

Extracting dates from data and then eliminating first row from data frame:

dates =list(data.iloc[0][3:])

data=data.iloc[1:]

Inserting dates into dict:

points={"dates":dates}

Iterating through data frame and adding elements to the dictionary:

 i=0 

for index, row in data.iterrows():
   element= {"lon":row["Col_1"], 
             "lat":row["Col_2"],
             "measurements": [row["Col_3"], row["Col_4"]]}
   points[str(i)]=element
   i+=1

You can convert dict to string object using json.dumps():

points_json = json.dumps(points)

It will be string object, not json(dict) object. More about that in this post Converting dictionary to JSON

ElaSz
  • 107
  • 2
  • 10
0

I converted the pandas dataframe values to a list, and then loop through one of the lists, and add the lists to a nested JSON object containing the values.

import pandas
import json
import argparse
import sys

def parseInput():
    parser = argparse.ArgumentParser(description="Convert CSV measurements to JSON")

    parser.add_argument(
        '-i', "--input",
        help="CSV input",
        required=True,
        type=argparse.FileType('r')
    )

    parser.add_argument(
        '-o', "--output",
        help="JSON output",
        type=argparse.FileType('w'),
        default=sys.stdout
    )

    return parser.parse_args()

def main():
    args = parseInput()

    input_file = args.input
    output = args.output

    dataframe = pandas.read_csv(input_file)

    longitudes = dataframe.iloc[1:,0:1].T.values.tolist()[0]
    latitudes = dataframe.iloc[1:,1:2].T.values.tolist()[0]
    averages = dataframe.iloc[1:,2:3].T.values.tolist()[0]
    measurements = dataframe.iloc[1:,3:].values.tolist()
    dates=dataframe.iloc[0][3:].values.tolist()

    points={"dates":dates}

    for index, val in enumerate(longitudes):
        entry = {
            "lon":longitudes[index],
            "lat":latitudes[index], 
            "measurements":measurements[index], 
            "average":averages[index]
            }
        points[str(index)] = entry
        
    json.dump(points, output)

    
if __name__ == "__main__":
    main()
Steffan
  • 536
  • 5
  • 15