0

I have a bit python code below. Just an example to show the problem: I would like to select some lines in a data frame basing on some values. Somehow this needs to be in a for loop, and I used .append() to add each selection of rows into a final file. But the result is not the same as what I expected. I learned by reading quite some posts that we should not append as a data frame in a loop. So I don't know how I could do this now. Could somebody help please? Thanks a lot!

import pandas as pd


df = pd.DataFrame({'a': [4, 5, 6, 7], 'b': [10, 20, 30, 40], 'c': [100, 50, -30, -50]})
df['diff'] = (df['b'] - df['c']).abs()
print(df)
df1 = df[df['diff'] == 90]
df2 = df[df['diff'] == 60]

list = [df1, df2]


def try_1(list):
    output = []
    for item in list:
        output.append(item)
    return output


print(try_1(list))

output from the code

   a   b    c  diff
0  4  10  100    90
1  5  20   50    30
2  6  30  -30    60
3  7  40  -50    90


[   a   b    c  diff
0  4  10  100    90
3  7  40  -50    90,    a   b   c  diff
2  6  30 -30    60]

but the expected output of print(try_1(list))

a   b    c  diff
4  10  100    90
7  40  -50    90
6  30  -30    60

Also, I need to write this final one into a file. I tried .write(), and it complained not a string. How could I solve this please? Thanks!

zzz
  • 153
  • 8

1 Answers1

0

Your code just recreates the same list you had before, you can just use pd.concat instead, to write it to a frame you have to convert it to a str first:

import pandas as pd

df = pd.DataFrame({'a': [4, 5, 6, 7], 'b': [10, 20, 30, 40], 'c': [100, 50, -30, -50]})
df['diff'] = (df['b'] - df['c']).abs()
# print(df)
df1 = df[df['diff'] == 90]
df2 = df[df['diff'] == 60]

my_list = [df1, df2]

all_frames = pd.concat(my_list)
with open("file", "w") as f:
    f.write(str(all_frames))

If you need to append in a for loop and occasionaly write you could do it like this:

import pandas as pd

df = pd.DataFrame({'a': [4, 5, 6, 7], 'b': [10, 20, 30, 40], 'c': [100, 50, -30, -50]})
df['diff'] = (df['b'] - df['c']).abs()
# print(df)
df1 = df[df['diff'] == 90]
df2 = df[df['diff'] == 60]

my_list = [df1, df2]
for i in range(20):
    my_list.append(df2)
    if i % 5 == 0: # whenever we want to write
        all_frames = pd.concat(my_list)
        my_list = [all_frames]
        with open("file", "w") as f:
            f.write(str(all_frames))
cafce25
  • 15,907
  • 4
  • 25
  • 31
  • Thanks! I just have a question about pd.concat(). I read here https://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-and-then-filling-it that it's better to grow a list. My actual list is much bigger than the example with many more columns (> 10,000 lines). Will the pd.concat() be a problem? – zzz Nov 18 '22 at 13:20
  • oh sorry, and it needs to be in a for loop. – zzz Nov 18 '22 at 13:24
  • 1
    You have to concat before you need the full dataframe (before you write to disk), up to then you should keep it in a list yes. – cafce25 Nov 18 '22 at 13:25
  • how do you concat in a for loop? – zzz Nov 18 '22 at 13:37