-1

I just started with programming and have some trouble to correctly import my CSV file.

To import it I use the following code:

data_fundamentals = open(path_fundamentals, newline= '')
reader_fundamentals = csv.reader(data_fundamentals)
header_fundamentals = next(reader_fundamentals)
fundamentals = [row for row in reader_fundamentals] 

Then convert it into a DataFrame:

df_fundamentals = pd.DataFrame(fundamentals, columns= header_fundamentals)

Here comes my first problem: Out of the CSV file "fundamentals" I just need certain columns for my DataFrame. I started by inserting them all by hand, which of course is not very efficient. Do you have an easier way?

df_kennzahlen.insert(1, 'Fiscal Year' , df_fundamentals['fyear'])
df_kennzahlen.insert(2, 'Current Assets' , df_fundamentals['act'])
df_kennzahlen.insert(3, 'Net Income/Loss' , df_fundamentals['ni'])
df_kennzahlen.insert(4, 'Total Liabilities' , df_fundamentals['lt'])
df_kennzahlen.insert(5, 'Long-Term Debt' , df_fundamentals['dltp'])
df_kennzahlen.insert(6, 'Cash' , df_fundamentals['ch'])
df_kennzahlen.insert(7, 'Total Assets' , df_fundamentals['at'])
df_kennzahlen.insert(8, 'Trade Payables' , df_fundamentals['ap'])
df_kennzahlen.insert(9, 'R&D-Expenses' , df_fundamentals['xrd'])
df_kennzahlen.insert(10, 'Sales' , df_fundamentals['sale'])

The values in the DataFrame are numbers, but have the string data-type. To convert them I use the following code:

df_kennzahlen['Net Income/Loss'] = pd.to_numeric(df_kennzahlen['Net Income/Loss'], downcast='integer')
df_kennzahlen['Total Liabilities'] = pd.to_numeric(df_kennzahlen['Total Liabilities'], downcast='integer')
df_kennzahlen['Long-Term Debt'] = pd.to_numeric(df_kennzahlen['Long-Term Debt'], downcast='integer')
df_kennzahlen['Cash'] = pd.to_numeric(df_kennzahlen['Cash'], downcast='integer')
df_kennzahlen['Total Assets'] = pd.to_numeric(df_kennzahlen['Total Assets'], downcast='integer')
df_kennzahlen['Trade Payables'] = pd.to_numeric(df_kennzahlen['Trade Payables'], downcast='integer')
df_kennzahlen['R&D-Expenses'] = pd.to_numeric(df_kennzahlen['R&D-Expenses'], downcast='integer')
df_kennzahlen['Sales'] = pd.to_numeric(df_kennzahlen['Sales'], downcast='integer') 

Again I have the same problem, it is not very efficient and the values in the DataFrame are not converted correctly. For example a 4680 is displayed as 0.4680 and 3235300 is shown as 323.530. Do you have any ideas how I can make the code more efficient and have the correct values in the DataFrame?

2 Answers2

1

You can pass the columns that you need as a list via the usecols parameter

import pandas as pd
df=pd.read_csv(filename,header=0,usecols=['a','b'],converters={'a': str, 'b': str})
MHK
  • 156
  • 1
  • 5
  • great, thanks. Do you also have a suggestions how I can convert the values? –  Jul 31 '20 at 18:25
  • You can use the dtype or converters option in read_csv function of Pandas - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html – MHK Jul 31 '20 at 18:29
0

With the pd.read_csv function, you can specify exactly how to read your CSV file. Specially, you can select columns (usecols param), parse date columns (parse_dates param), change the default separator (sep = ";", for instance), change the decimal and thousands separator (decimal = ",", thousands = ".", for instance). These 2 last items are speccially useful when working with non-default CSVs.

Please refer to the docs for the full list of parameters.

Danilo Filippo
  • 762
  • 1
  • 6
  • 12