1

I have a csv file which looks like below

,date,location,device,provider,cpu,mem,load,drops,id,latency,gw_latency,upload,download,sap_drops,sap_latency,alert_id
0,2018-02-10 11:52:59.342269+00:00,CFE,10.0.100.1,BWE,6.0,23.0,11.75,0.0,,,,,,,,
1,2018-02-10 11:53:04.006971+00:00,CDW,10.0.100.1,GRE,6.0,23.0,4.58,0.0,,,,,,,,
2,2018-02-09 11:52:59.342269+00:00,,,SSD,,,10.45,,,,,,,,,
3,2018-02-08 09:52:59.342269+00:00,,,BWE,,,12.45,,,,,,,,,
4,2018-02-07 04:52:59.342269+00:00,,,RRW,,,9.45,,,,,,,,,
5,2018-02-06 05:52:59.342269+00:00,,,GRE,,,5.45,,,,,,,,,
6,2018-02-05 07:52:59.342269+00:00,,,SSD,,,13.45,,,,,,,,,
7,2018-02-04 10:52:59.342269+00:00,,,SSD,,,8.15,,,,,,,,,
8,2018-02-03 10:52:59.342269+00:00,,,GRE,,,4.15,,,,,,,,,
9,2018-02-02 06:52:59.342269+00:00,,,RRW,,,13.15,,,,,,,,,
10,2018-02-10 22:35:33.438948+00:00,QQW,10.12.11.1,VCD,4.0,23.0,5.0,0.0,,,,,,,,
11,2018-02-10 22:35:37.905242+00:00,CSW,10.12.11.1,VCD,4.0,23.0,6.08,0.0,,,,,,,,
.......................................................................................
.......................................................................................

I load up the csv file like below

df = pd.read_csv("metrics_copy.csv", parse_dates=["date"])   
df['device'] = df['device'].astype(str)
unique_devices = (np.unique(df[['device']].values))
unique_provider = np.unique(df[['provider']].values)

I want to get a csv file which contains only certain columns for a certain combination.

for i in unique_devices:
    for j in ["cpu", "mem"]:
        df2 = df[(df['device'] == i)]
        df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
        print(df2[j])

As you can see, for every unique combination of device and metric, I will get a a time series data.I am able to get a bunch of values for df2[j] for a given device.I want to output these values to a csv file for all combinations as long as the loop continues..I am aware of a concept called pd.concat which can be used like below

df_final = pd.concat([df, df2, df3.....])

But for that I need to generate the dataframes for all possible combinations and then finally concat them to become one dataframe.So I want the end result csv file to look something like below for cpu

date cpu
...  ...
...  ...

And another csv file for mem which looks something like below

date mem
...  ...
...  ...

But I am not sure how do I achieve this.Any help?

Souvik Ray
  • 2,899
  • 5
  • 38
  • 70
  • https://stackoverflow.com/questions/17530542/how-to-add-pandas-data-to-an-existing-csv-file shows how to use the pandas.to_csv() function in append 'a' mode. Which would append as your loop goes to your two different files. – linamnt May 22 '18 at 18:22
  • @Lina great!but I need to solve the challenge of getting the data for every loop and storing it until the inner loop breaks. – Souvik Ray May 22 '18 at 18:28
  • Sorry if I misunderstand, but couldn't you use df2['mem'].to_csv('mem.csv', mode='a', header=False) and df2['cpu'].to_csv('cpu.csv', mode='a', header=False) in the inner loop and once it breaks, you are left with the final file with all the dataframes that were generated? – linamnt May 22 '18 at 18:30
  • @Lina great! This is what I wanted!I misunderstood your earlier response.You may post this as an answer and I wil accept. – Souvik Ray May 23 '18 at 06:42

1 Answers1

0

Adapted from the following using df.to_csv() in append mode: How to add pandas data to an existing csv file?

for i in unique_devices:
    for j in ["cpu", "mem"]:
        df2 = df[(df['device'] == i)]
        df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
            df2[['date',j]].to_csv('{}.csv'.format(j), mode='a', index=False, header=False)

Alternatively you can have an if statement to check if the file exists so the first time it's made, the header will be used, and thereafter it is ignored:

for i in unique_devices:
    for j in ["cpu", "mem"]:
        df2 = df[(df['device'] == i)]
        df2["date"] = pd.to_datetime(df2["date"], format="%Y-%m-%d")
        import os
        if not os.path.isfile('{}.csv'.format(j)):
            df.to_csv('{}.csv'.format(j), mode='a', index=False)
        else:
            df2[['date',j]].to_csv('{}.csv'.format(j), mode='a', index=False, header=False)
linamnt
  • 1,315
  • 1
  • 12
  • 23