0

I have written a code to retrieve JSON data from an URL. It works fine. I give the start and end date and it loops through the date range and appends everything to a dataframe.

The colums are populated with the JSON data sensor and its corresponding values, hence the column names are like sensor_1. When I request the data from the URL it sometimes happens that there are new sensors and the old ones are switched off and deliver no data anymore and often times the length of the columns change. In that case my code just adds new columns.

What I want is instead of new columns a new header in the ongoing dataframe.

What I currently get with my code:

datetime;sensor_1;sensor_2;sensor_3;new_sensor_8;new_sensor_9;sensor_10;sensor_11;
2023-01-01;23.2;43.5;45.2;NaN;NaN;NaN;NaN;NaN;
2023-01-02;13.2;33.5;55.2;NaN;NaN;NaN;NaN;NaN;
2023-01-03;26.2;23.5;76.2;NaN;NaN;NaN;NaN;NaN;
2023-01-04;NaN;NaN;NaN;75;12;75;93;123;
2023-01-05;NaN;NaN;NaN;23;31;24;15;136;
2023-01-06;NaN;NaN;NaN;79;12;96;65;72;

What I want:

datetime;sensor_1;sensor_2;sensor_3;
2023-01-01;23.2;43.5;45.2;
2023-01-02;13.2;33.5;55.2;
2023-01-03;26.2;23.5;76.2;
datetime;new_sensor_8;new_sensor_9;sensor_10;sensor_11;
2023-01-04;75;12;75;93;123;
2023-01-05;23;31;24;15;136;
2023-01-06;79;12;96;65;72;

My loop to retrieve the data:

start_date = datetime.datetime(2023,1,1,0,0)
end_date = datetime.datetime(2023,1,6,0,0)
    
sensor_data = pd.DataFrame()

while start_zeit < end_zeit:
    q = 'url' 
    r = requests.get(q)
    j = json.loads(r.text)
    sub_data = pd.DataFrame()
    if 'result' in j:    
        datetime = pd.to_datetime(np.array(j['result']['data'])[:,0])
        sensors = np.array(j['result']['sensors'])  
                        
        data = np.array(j['result']['data'])[:,1:]

        df_new = pd.DataFrame(data, index=datetime, columns=sensors)
        sub_data = pd.concat([sub_data, df_new])
    sensor_data = pd.concat([sensor_data, sub_data])
    start_date += timedelta(days=1)
Gobrel
  • 179
  • 9
  • You cannot have an 'extra' set of columns labels with new columns in some middle row in the same DataFrame; that's not how pandas works. Why not just split into two dataframes? – user19077881 Jan 09 '23 at 15:47
  • @user19077881 seperate dataframes would also be a good solution for me. Somewhere I saw a csv which was the output of JAVA code that had the mulitple headers like in the solution what I like and I thought this was also possible in python. – Gobrel Jan 09 '23 at 15:52
  • Try using [https://stackoverflow.com/questions/38152389/coalesce-values-from-2-columns-into-a-single-column-in-a-pandas-dataframe](https://stackoverflow.com/questions/38152389/coalesce-values-from-2-columns-into-a-single-column-in-a-pandas-dataframe) as a reference for a possible solution. – Rownum Highart Jan 09 '23 at 15:55

1 Answers1

0

if 2 DataFrames will do for you the you can simply split using the column names:

df1 = df[['datetime', 'sensor_1', 'sensor_2', 'sensor_3']]
df2 = df[['datetime', 'new_sensor_8', 'new-sensor_9', 'sensor_10', 'sensor_11']]

Note the [[ used.

and use .dropna() to lose the NaN rows

user19077881
  • 3,643
  • 2
  • 3
  • 14
  • I think I need to be more precise. The dataframes in my posting were examples. I don‘t know how many colums are in the url request response and how they change. I need to compare if the current column is the same like the column from the previous request. When it is different then there should be a new line or seperate df. – Gobrel Jan 09 '23 at 17:31
  • I am not clear what the criteria are for splitting. You seem to split the data depending in NaN values. – user19077881 Jan 09 '23 at 22:54
  • Yes in some kind. Since I am not good in programming I can't describe it good. What I want is something where I check if the header is the same or not, if it is not the same then there should be some action to be taken and created some new dataframe/csv so that I can catch a change. – Gobrel Jan 10 '23 at 10:08