0

I have a dataframe which is:

Index   city_code   date   sector   price
1   1           2010-01 A   50000
2   1           2010-01 B   100000
3   2           2010-01 A   150000
4   3           2010-01 A   322222
5   1           2010-01 C   124555
6   2           2010-01 C   30000
7   2           2010-01 B   20000
8   1           2010-02 A   45000
9   1           2010-02 B   120000
10  2           2010-02 A   30000
11  3           2010-02 A   1222400
12  1           2010-02 C   20000
13  2           2010-02 C   50000
14  2           2010-02 B   360000

I want to append data into different data frames according to sectors.

I try to solve this problem with the following code. But unfortunately, it does not work.

df = pd.read_csv('dataset.csv', sep=';')

area_list = pd.DataFrame(df['sector'].unique())
columns = df.columns
df_A = pd.DataFrame(columns=columns)
df_B = pd.DataFrame(columns=columns)
df_C = pd.DataFrame(columns=columns)

for i in area_list:
    x = df[df['sector'] == i]
    if i == 'A':
        df_A.append(x)
    elif i == 'B':
        df_B.append(x)
    elif i == 'C':
        df_C.append(x)

This code does not append values to empty dataframes (df_A, df_B, df_C). How can I solve this problem?

Georgy
  • 12,464
  • 7
  • 65
  • 73
Hale
  • 41
  • 2
  • 7
  • 1
    What have you tried to debug the problem? Why not check for these corner cases? – Nico Haase May 06 '19 at 07:47
  • But I want to create different dataframes which are includes each city values. This solution don't solve my problem. – Hale May 06 '19 at 08:43

1 Answers1

0

You can separate data in this way by sector attributes.

df_A = df[df['sector']=='A']
df_B = df[df['sector']=='B']
df_C = df[df['sector']=='C']

Or

for sector, df_sector in df.groupby('sector'):
    if (sector == 'A'):
        df_A = df_sector
    elif (sector == 'B'):
        df_B = df_sector
    elif (sector == 'C'):
        df_C = df_sector

>>> import pandas as pd
>>> df = pd.read_csv('dataset.csv')
>>> df.head()
   city        date sector        price
0     1  2010-01-01      A   53675300.0
1     1  2010-01-01      B   13415070.0
2     1  2010-01-01      C  474007000.0
3     1  2010-01-01      D  218028700.0
4     1  2010-01-01      E    2073598.0
>>> for sector, df_sector in df.groupby('sector'):
...     if (sector == 'A'):
...             df_A = df_sector
...     elif (sector == 'B'):
...             df_B = df_sector
...     elif (sector == 'C'):
...             df_C = df_sector
...     elif (sector == 'D'):
...             df_D = df_sector
...     else:
...             df_E = df_sector
... 
>>> df_A
   city        date sector       price
0     1  2010-01-01      A  53675300.0
>>> df_B
   city        date sector       price
1     1  2010-01-01      B  13415070.0
>>> df_C
   city        date sector        price
2     1  2010-01-01      C  474007000.0
>>> df_D
   city        date sector        price
3     1  2010-01-01      D  218028700.0
>>> df_E
   city        date sector      price
4     1  2010-01-01      E  2073598.0
>>> 
  • I tried this but df_A, df_B etc. are empty dataframes. – Hale May 06 '19 at 08:57
  • If the data set is correctly read, the query should run. – ramazanbozkir May 06 '19 at 10:18
  • df = pd.read_csv('dataset.csv', sep=';') df['price'] = df['price'].str.replace(',','.') df['price'] = df['price'].astype(float) # Column of tarih's data type is object. We convert its data type as datetime df['date'] = pd.to_datetime(df['date']) # we drop the rows which is sector = 0 df.drop( df[ df['sector'] == '0' ].index , inplace=True) df_A = df[df['sector']=='A'] df_B = df[df['sector']=='B'] df_C = df[df['sector']=='C'] #and df_A, df_B etc. Type: DataFrame, Size: (0,4), Value: Column names: city, date, sector, price.. in other words df__A, df_B etc. are empty dataFrame – Hale May 06 '19 at 10:55
  • Would you send the data set before it was divided by the sector attribute? – ramazanbozkir May 06 '19 at 11:06
  • 1) df = pd.read_csv('dataset.csv', sep=';') 2) df['price'] = df['price'].str.replace(',','.') 3) df['price'] = df['price'].astype(float) 4) df['date'] = pd.to_datetime(df['date']) – Hale May 06 '19 at 11:24
  • After you perform these steps, run df.head () and send the output. – ramazanbozkir May 06 '19 at 11:34
  • city date sector price 0 1 2010-01-01 A 5.367530e+07 1 1 2010-01-01 B 1.341507e+07 2 1 2010-01-01 C 4.740070e+08 3 1 2010-01-01 D 2.180287e+08 4 1 2010-01-01 E 2.073598e+06 – Hale May 06 '19 at 11:39
  • Only df_E is created. df_A, df_B, df_C are not created. – Hale May 06 '19 at 12:15
  • Can you share the screen image of the entire code. – ramazanbozkir May 06 '19 at 12:19
  • how can i share screen image ? – Hale May 06 '19 at 12:41