0

I have a scenario where I have a CSV file having below data:

Host, Time Up, Time OK
server1.test.com:1717,100.00% ,100.00% 
server2.test.com:1717,100.00% ,100.00% 

I am trying to compare column values in all rows:

  • if col1 <= col2 then it should print the value of col1 in a new col3
  • if col1 > col2 then print col2 value in col3.

Example:

Time Up(col1), Time OK(col2), Total(col3)
100%              100%         100%
100%              95%          95%
95%               100%         95%

I searched through internet and was unable to find any case. Is there any way to achieve this ?

EDIT2 : code-

import pandas as pd
df = pd.read_csv('3.csv',skipfooter=1)
df2 = pd.read_csv('4.csv',skipfooter=1)
combined = pd.merge(df[['Host',' Time Up']],df2[['Host',' Time OK']], on='Host')
combined[' Time OK'] = combined[' Time OK'].apply(lambda x: x.split('(')[0])
combined[' Time Up'] = combined[' Time Up'].apply(lambda x: x.split('(')[0])
combined.to_csv('combined.csv',index=False)

df =pd.read_csv('combined.csv', skipfooter=1)
col1 = df[' Time Up']
col2 = df[' Time OK']
df['Total'] = col1.where(col1 <= col2, col2)
df.to_csv('combined.csv',index=False)
Alex
  • 189
  • 4
  • 23
  • Possible duplicate of [Using conditional to generate new column in pandas dataframe](https://stackoverflow.com/questions/27041724/using-conditional-to-generate-new-column-in-pandas-dataframe) – ASGM Oct 20 '17 at 14:07
  • Possible duplicate of [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – IanS Oct 20 '17 at 14:17
  • i don't see a duplicate here – Alex Oct 20 '17 at 15:27

1 Answers1

0

Sure, just use read_csv() to read the data:

import pandas as pd
df = pd.read_csv('t.csv') # this is your original example input file

Now you have:

                    Host   Time Up   Time OK
0  server1.test.com:1717  100.00%   100.00% 
1  server2.test.com:1717  100.00%   100.00% 

The first issue is your CSV has spurious whitespace in the header. Let's clean it up:

df.columns = [col.strip() for col in df.columns] # " Time Up" -> "Time Up"

Next, note that your data are strings like "100.00% ". Clean that:

df['Time Up'] = df['Time Up'].str.strip('% ').astype(float)
df['Time OK'] = df['Time OK'].str.strip('% ').astype(float)

Now we have clean data:

                    Host  Time Up  Time OK
0  server1.test.com:1717    100.0    100.0
1  server2.test.com:1717    100.0    100.0

Finally, we can add the new column:

col1 = df['Time Up']
col2 = df['Time OK']
df['Total'] = col1.where(col1 <= col2, col2)

Giving us:

                    Host  Time Up  Time OK  Total
0  server1.test.com:1717    100.0    100.0  100.0
1  server2.test.com:1717    100.0    100.0  100.0

An alternative way to get the Total column would be:

df['Total'] = df[['Time Up', 'Time OK']].min(axis=1)

That is, take the minimum value in each row.

If you want the percent signs added back:

df['Total'] = df['Total'].astype(str) + '%'
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • Do I need to import pandas module? – Alex Oct 20 '17 at 15:17
  • 1
    Yes you will, in order to use the read_csv() function that is in pandas, as well as the df (which is short hand for dataframe) – Dylan Oct 20 '17 at 16:09
  • Thanks for the response. – Alex Oct 20 '17 at 16:58
  • Did not execute succesfully Added my code and error I received.Am I missing something? – Alex Oct 20 '17 at 19:09
  • Do you have pandas installed? Also it doesn't look like Time OK is recognized as a column header... Is it included in your csv file? – kjmerf Oct 20 '17 at 19:23
  • yes it is included u can see in question itself the CSV data is exactly the same and I have pandas installed. – Alex Oct 20 '17 at 19:52
  • I have solved the previous error by replacing few things(see EDIT2) but this doesn't applies the logic and output is different. I have a value in col 2 that is 95 and col1 100 so according to expectation it should provide total as 95 but it is not the case it outputs 100 only in col3(total) – Alex Oct 20 '17 at 20:14
  • also just saw one more test case where col1 98.81 and col2 98.82 it is resulting in 98.82 in col3(ideally it should be 98.81) – Alex Oct 20 '17 at 21:08
  • @kbball just wanted the output like in col1 & col2 whichever value is less return that column and if equal return col1. if col1 = 99.81 & col2= 99.82 then return col1 if col1=100 & col2=100 then return col1 if col1=99.82 & col2=99.81 then return col2 these are the cases – Alex Oct 20 '17 at 21:38
  • @Rebecca21: I have updated my answer with full, step-by-step code which works with your example input. – John Zwinck Oct 21 '17 at 00:52
  • Thanks for explaining so wonderfully :-) Everything works fine but can I add those % signs later once my comparison task is finished back to those coloumns because I really need those. Thanks again for the solution. – Alex Oct 21 '17 at 03:35
  • @Rebecca21: OK, I've added a step at the end to turn the data back into strings and append `%`. – John Zwinck Oct 21 '17 at 04:08