I have a CSV file whose columns are frequency counts of words, and whose rows are time periods. I want to sum for each column the total frequencies. Then I want to write to a CSV file for sums greater than or equal to 30, the column and row values, thus dropping columns whose sums are less than 30.
Just learning python and pandas. I know it is a simple question, but my knowledge is at that level. Your help is most appreciated.
I can read in the CSV file and compute the column sums.
df = pd.read_csv('data.csv')
Except of data file containing 3,874 columns and 100 rows
df.sum(axis = 0, skipna = True)
I am stuck on how to create the output file so that it looks like the original file but no longer has columns whose sums were less than 30.
I am stuck on how to write to a CSV file each row for each column whose sums are greater than or equal to 30. The layout of the output file would be the same as for the input file. The sums would not be included in the output.
Thanks very much for your help.
So, here is a link showing an excerpt of a file containing 100 rows and 3,857 columns: