2

I'm complete newby to any kind of these programs.

I studied philosophy and economy and trying to learn python for web crawler for my own investment strategy.

I'm from South Korea, so I'm quite nervous to type English here, but I'm trying to be brave! (please, excuse my ugly English)

enter image description here

this is the DataFrame that I've got from the website.

I'm crawling financial datas and as you might see, numbers has commas in it.

their types are object.

what I want to do is to make them integer so I can do some math.(sum, multiplication, etc.)

I searched (including Korean web sites) and I found the way to do using columns name, like this code

cols = ['col1', 'col2', ..., 'colN']

df[cols] = df[cols].replace({'\$': '', ',': ''}, regex=True)

But, what I need is doing it regardless columns' name

I need over 2000 companies' data and columns' names are different depending on company

I'd like to make a code like

"Delete ',' in cols, cols from col#0 to col#end"

Thanks in advance

Ric S
  • 9,073
  • 3
  • 25
  • 51
Rock Lee
  • 47
  • 4

2 Answers2

0

Based on this answer, you can just get a list of column names, add it into a variable and simply call it where you would have the list of columns. But there are other things to keep in mind, as well. In the documentation, replace is a function that is applied to the dataframe, you might get errors if you do something like df = df.replace(). And the last idea is that the number formatting might be visual only. Can you not work with the data in there? A conversion might help you, but it might also not be an issue at all, if you simply want to work with data. Another idea would be converting them from numbers to strings, and replacing the commas with spaces, if needed be. This answer might help you with that.

Andre
  • 47
  • 7
  • Like what you said it showed me only list of column names. So I could borrow you suggestions to solve my problem! Thanks a lot! – Rock Lee Mar 19 '20 at 01:11
0

the very first thing you can do is to differentiate data frame by their type and do the processing they needed.

object_list = list(df.select_dtypes(include ="object"))
float_list = list(df.select_dtypes(include ="float64"))
int_list = list(df.select_dtypes(include ="int64"))

then replace whatever you need

df[object_list] = df[object_list].replace(",","")

df[float_list ] = df[float_list ].apply(str) # so that you can replace easily
df[float_list ] = df[float_list ].replace(",","")
df[float_list ] = df[float_list ].apply(float) # now its clean and int

df[int_list ] = df[int_list ].apply(str)
df[int_list ] = df[int_list ].replace(",","")
df[float_list ] = df[float_list ].apply(int)
Manish Chaudhary
  • 498
  • 6
  • 14
  • Thnaks a lot! My DataFrame is 'df_fs' so I did follow your suggestion like object_list = list(df_fs.select_dtypes(include ="object")) And I found that it shows only column names like Andre said in the next answer. So, I did like Andre suggested df_fs[object_list] = df_fs[object_list].replace({'\$': '', ',': ''}, regex=True) And it solved my problem! I'm not sure this is the best way I can do, but now I'm totally satisfied it's working anyhow! – Rock Lee Mar 19 '20 at 01:06