How to correctly create a DataFrame out of a CSV file?

Question

I just started with programming and have some trouble to correctly import my CSV file.

To import it I use the following code:

data_fundamentals = open(path_fundamentals, newline= '')
reader_fundamentals = csv.reader(data_fundamentals)
header_fundamentals = next(reader_fundamentals)
fundamentals = [row for row in reader_fundamentals]

Then convert it into a DataFrame:

df_fundamentals = pd.DataFrame(fundamentals, columns= header_fundamentals)

Here comes my first problem: Out of the CSV file "fundamentals" I just need certain columns for my DataFrame. I started by inserting them all by hand, which of course is not very efficient. Do you have an easier way?

df_kennzahlen.insert(1, 'Fiscal Year' , df_fundamentals['fyear'])
df_kennzahlen.insert(2, 'Current Assets' , df_fundamentals['act'])
df_kennzahlen.insert(3, 'Net Income/Loss' , df_fundamentals['ni'])
df_kennzahlen.insert(4, 'Total Liabilities' , df_fundamentals['lt'])
df_kennzahlen.insert(5, 'Long-Term Debt' , df_fundamentals['dltp'])
df_kennzahlen.insert(6, 'Cash' , df_fundamentals['ch'])
df_kennzahlen.insert(7, 'Total Assets' , df_fundamentals['at'])
df_kennzahlen.insert(8, 'Trade Payables' , df_fundamentals['ap'])
df_kennzahlen.insert(9, 'R&D-Expenses' , df_fundamentals['xrd'])
df_kennzahlen.insert(10, 'Sales' , df_fundamentals['sale'])

The values in the DataFrame are numbers, but have the string data-type. To convert them I use the following code:

df_kennzahlen['Net Income/Loss'] = pd.to_numeric(df_kennzahlen['Net Income/Loss'], downcast='integer')
df_kennzahlen['Total Liabilities'] = pd.to_numeric(df_kennzahlen['Total Liabilities'], downcast='integer')
df_kennzahlen['Long-Term Debt'] = pd.to_numeric(df_kennzahlen['Long-Term Debt'], downcast='integer')
df_kennzahlen['Cash'] = pd.to_numeric(df_kennzahlen['Cash'], downcast='integer')
df_kennzahlen['Total Assets'] = pd.to_numeric(df_kennzahlen['Total Assets'], downcast='integer')
df_kennzahlen['Trade Payables'] = pd.to_numeric(df_kennzahlen['Trade Payables'], downcast='integer')
df_kennzahlen['R&D-Expenses'] = pd.to_numeric(df_kennzahlen['R&D-Expenses'], downcast='integer')
df_kennzahlen['Sales'] = pd.to_numeric(df_kennzahlen['Sales'], downcast='integer')

Again I have the same problem, it is not very efficient and the values in the DataFrame are not converted correctly. For example a 4680 is displayed as 0.4680 and 3235300 is shown as 323.530. Do you have any ideas how I can make the code more efficient and have the correct values in the DataFrame?

import pandas; df = pandas.read_csv("csv_file.csv", usecols=['col1','col2']) — Baban Gain, Jul 31 '20 at 18:20
Does this answer your question? [Import CSV file as a pandas DataFrame](https://stackoverflow.com/questions/14365542/import-csv-file-as-a-pandas-dataframe) — santiagoNublado, Jul 31 '20 at 18:23

MHK · Answer 1 · 2020-07-31T18:34:10.960

1

You can pass the columns that you need as a list via the usecols parameter

import pandas as pd
df=pd.read_csv(filename,header=0,usecols=['a','b'],converters={'a': str, 'b': str})

edited Jul 31 '20 at 18:34

answered Jul 31 '20 at 18:17

MHK

156
1
5

great, thanks. Do you also have a suggestions how I can convert the values? – Jul 31 '20 at 18:25
You can use the dtype or converters option in read_csv function of Pandas - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html – MHK Jul 31 '20 at 18:29

score 0 · Answer 2 · answered Jul 31 '20 at 18:28

With the pd.read_csv function, you can specify exactly how to read your CSV file. Specially, you can select columns (usecols param), parse date columns (parse_dates param), change the default separator (sep = ";", for instance), change the decimal and thousands separator (decimal = ",", thousands = ".", for instance). These 2 last items are speccially useful when working with non-default CSVs.

Please refer to the docs for the full list of parameters.

How to correctly create a DataFrame out of a CSV file?

2 Answers2