0

I am changing my original code, to present a much simplified version of it. But, this is where the main problem is occurring. Using the following code:

Sp=pd.DataFrame()
l1=['a', 'b', 'c']
for i in l1:
    Sp['col1'] = i

Gives me the result Sp as:

col1

I would want my col1 to have values a, b and c. Could anyone please suggest why this is happening, and how to rectify it.

EDIT:

For every value in my list, I use it to connect to a different file using os, (file names are made up of list values). After picking up the csv file from there I take values such as mean, devisation etc. of the data from the file and assign those values to sp in another column. My final sp should look something as follows:

col1    Mean    Median  Deviation
a       1       1.1     0.5
b       2       2.1     0.5
c       3       3.1     0.5
A.DS
  • 216
  • 1
  • 4
  • 14
  • @ Jezrael I am not using the loop just for assignment purpose. There are other operations too in every iteration of the loop. – A.DS Jun 29 '18 at 10:30

1 Answers1

0

EDIT: If need for each loop create DataFrame and processes it, iterate and final DataFrame append to list of DataFrames. Last concat all aggregated DataFrames together:

dfs = []
l1 = ['a', 'b', 'c']
for i in l1:
    df = pd.read_csv(file)
    df = df.groupby('col').agg({'col1':'mean', 'col2':'sum'})
    #another code
    dfs.append(df)

Sp = pd.concat(dfs, ignore_index=True)

Old answer:

I think need call DataFrame constructor with list:

Sp = pd.DataFrame({'col1':l1})

If really need it, but it is the slowiest possible solution:

6) updating an empty frame a-single-row-at-a-time. I have seen this method used WAY too much. It is by far the slowest. It is probably common place (and reasonably fast for some python structures), but a DataFrame does a fair number of checks on indexing, so this will always be very slow to update a row at a time. Much better to create new structures and concat.

Sp=pd.DataFrame()
l1=['a', 'b', 'c']
for j, i in enumerate(l1):
    Sp.loc[j, 'col1'] = i

print (Sp)
  col1
0    a
1    b
2    c
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks. Large data set . Wouldn't want it to slow down. Any other way to still use the loop? – A.DS Jun 29 '18 at 10:40
  • @A.DS - Loops is necessary? Can you explain more? – jezrael Jun 29 '18 at 10:41
  • Use [assign][1] which assigns new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones as shown below - Sp = Sp.assign( col1 = l1 ) [1]: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html – Vikash Kumar Jun 29 '18 at 10:42
  • For every value in my list, I use it to connect to a different file using os. After picking up the csv file from there I take values such as mean, devisation etc. of the data from the file and assign those values to sp in another column. – A.DS Jun 29 '18 at 10:43
  • @A.DS - I think I understand, please check edited answer. – jezrael Jun 29 '18 at 11:19