1

I have two problems with filling out a very large dataframe. There is a section of the picture. I want the 1000 in E and F to be pulled down to 26 and no further. In the same way I want the 2000 to be pulled up to -1 and down to the next 26. I thought I could do this with bfill and ffill, but unfortunately I don't know how...(picture1) enter image description here

Another problem is that columns occur in which the values from -1 to 26 do not contain any values in E and F. How can I delete or fill them with 0 so that no bfill or ffill makes wrong entries there? (picture2) enter image description here

import pandas as pd
import numpy as np

data = '/Users/Hanna/Desktop/Coding/Code.csv'


df_1 = pd.read_csv(data,usecols=["A",
                           "B",
                           "C",
                           "D",
                           "E",
                           "F",
                           ],nrows=75)


base_list =[-1,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
df_c = pd.MultiIndex.from_product([
[4000074],
["SP000796746","SP001811642"],
[201824, 201828, 201832, 201835, 201837, 201839, 201845, 201850, 201910, 201918, 201922, 201926, 201909, 201916, 201918, 201920],
base_list],

names=["A", "B", "C", "D"]).to_frame(index=False)
df_3 = pd.merge(df_c, df_1, how='outer')

To understand it better, I have shortened the example a bit. Picture 3 shows how it looks like when it is filled and picture 4 shows it correctly filled enter image description here

enter image description here

Hanna
  • 87
  • 6
  • Show us some code, otherwise it is very hard to help you. – Sebi May 12 '20 at 08:23
  • And give some data in text in the question itself so that we can copy/paste it for tests. I am way too lazy to copy everything... – Serge Ballesta May 12 '20 at 08:29
  • unfortunately I just have the code for the tables, but do not know how to continue. – Hanna May 12 '20 at 08:31
  • I added the code, but it is not possible to share the csv – Hanna May 12 '20 at 08:38
  • please paste the data as text. [See here for tips on how to make a good question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Haleemur Ali May 13 '20 at 12:54

2 Answers2

0

could find the indexes where you have -1 and then slice/loop over the columns to fill.

just to create the sample data:

import pandas as pd
df = pd.DataFrame(columns=list('ABE'))
df['A']=list(range(-1, 26)) * 10

add random values at each section

import random 

for i in df.index:
    if i%27 == 0:
        df.loc[i,'B'] = random.random()
    else:
        df.loc[i, 'B'] = 0

find the indexes to slice over

indx = df[df['A'] == -1].index.values

fill out data in column "E"

for i, j in zip(indx[:-1], indx[1:]):
    df.loc[i:j-1, 'E'] = df.loc[i:j-1, 'B'].max()

    if j == indx[-1]:
        df.loc[j:, 'E'] = df.loc[j:, 'B'].max()
matman9
  • 390
  • 3
  • 17
  • But the values that have to be placed in the columns I do not know before. So somehow first I have to find out which are the outermost values and then pull them down to -1 and 26 – Hanna May 12 '20 at 08:49
  • Or maybe I got you wrong. If I add your code to mine below, unfortunately the columns are not filled out or do I have to adjust something? – Hanna May 12 '20 at 08:53
  • how do you know what the possible values are to be filled? – matman9 May 12 '20 at 09:15
  • Maybe I have expressed it wrong, but the values are entered into the table df_c by the df_1. So first I have to check which of the outer values are in a -1 to 26 row and then I have to fill it with these. – Hanna May 12 '20 at 09:28
  • So sometimes there is the value 2500 or 100 – Hanna May 12 '20 at 09:32
  • looks like you may be better working from column "C", guessing these are dates? for each date you find the value in column "E" or "F" and set the rest of the values for each date to match – matman9 May 12 '20 at 09:37
  • Yes these are data, but the same data occur again for other articles. Is it still possible. I just create a new Excel to show the problem better. – Hanna May 12 '20 at 09:43
  • I have sent two more pictures, it would be nice if you look again – Hanna May 12 '20 at 09:52
  • have combined the second and third methods, see the second one. it finds the indexes where you have -1, then slices over the dataframe and gives that slice the max value in column E in that slice – matman9 May 12 '20 at 09:59
  • It is not working, maybe I do something wrong. Nevertheless tank you very much for your work – Hanna May 12 '20 at 10:30
  • if you can get the code i have posted to work, you should just need to change df to be whatever you dataframe is called, and the columns to be the relevant ones, so my column A would be your D to get the indexes, then the E and B columns i have you would change to be both E or both F – matman9 May 12 '20 at 10:39
  • 1
    OHHH man, it is working! Wonderful!!! Thank you very much. Not for both columns but for one!! But probably it is a mistake from my side, so I will try do fill correct. otherwise I'll ask again. – Hanna May 12 '20 at 11:01
0

Assuming you have to find and fill values for a particular segment.

data = pd.read_csv('/Users/Hanna/Desktop/Coding/Code.csv')    
for i in range(0,data.shape[0],27):
        if i+27 < data.shape[0]:
            data.loc[i:i+27,'E'] = max(data['E'].iloc[i:i+27])
        else:
            data.loc[i:data.shape[0],'E'] = max(data['E'].iloc[i:data.shape[0]])

you can replace the max to whatever you want.

Wickkiey
  • 4,446
  • 2
  • 39
  • 46