0

1) Sort the headers(columns) while creating a CSV file

2) Add a new header(column) in a CSV file

For the 1st problem, I have some CSV files, each of them has a column, then I merge these CSV files together. In the final CSV file, the headers are not in the right order.

The correct order should be a, b, c... but the final CSV file has the header c, b, a. How can I sort the headers?

    # create final csv file iteration
    for i in range(lenList - 1):
        newcsv = pd.read_csv(csv_list[i + 1])
        csv_out = newcsv.merge(oldcsv, on=['Time'], how="outer", sort=True)
        oldcsv = csv_out

    # saves the final csv file
    output_file = "../build/*.csv"
    oldcsv.to_csv(output_file, index=False)

For the 2nd problem, When I create CSV files, some 18 columns, and somehave 17 columns. But they should both have 18 columns.

E.g, file1 has columns a, b, c, d.file2 has columns a,b,c.

I need them to have same number of colums. So I need to add an empty column to file2.

Yao Qiang
  • 33
  • 6

1 Answers1

0

Yes, you can change the order of the columns, using use_cols as a parameter to read_csv:

usecols : list-like or callable, default None

Return a subset of the columns. If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided either by the user in names or inferred from the document header row(s). For example, a valid list-like usecols parameter would be [0, 1, 2] or [‘foo’, ‘bar’, ‘baz’]. Element order is ignored, so usecols=[0, 1] is the same as [1, 0]. To instantiate a DataFrame from data with element order preserved use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

If callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to True. An example of a valid callable argument would be lambda x: x.upper() in ['AAA', 'BBB', 'DDD']. Using this parameter results in much faster parsing time and lower memory usage.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

As far as the empty column, you can add an empty column

Alex W
  • 37,233
  • 13
  • 109
  • 109