Jupyter Notebook for csv file to select 3 window rolling

Question

I have this big data in csv file:

I manage to open this on Jupyter Notebook.
The data in csv example: 1 2 3 4 5 6 7 8 9 10
And I wanted to open in the notebook as '3 windows rolling' without doing any (sum,mean for example)
The output I want in the notebook are>>

Output dataset

as @jiangzui mentioned you can use `shift()`, First read csv to get first column, next use `shift` to get other columns `df["second"] = df["first].shift(-1)` `df["third"] = df["first].shift(-2)` — furas, May 08 '21 at 07:09

furas · Answer 1 · 2021-05-08T08:51:26.310

0

First open csv to get first column.

import pandas as pd

df = pd.read_csv("filename.csv")

I will use io only to simulate data from file

text = """first
1
2
3
4
5
6
7
8
9
10"""


import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))

Result

Next you can use shift to create other columns

df['second'] = df['first'].shift(-1)
df['third'] = df['first'].shift(-2)

Result

   first  second  third
0      1     2.0    3.0
1      2     3.0    4.0
2      3     4.0    5.0
3      4     5.0    6.0
4      5     6.0    7.0
5      6     7.0    8.0
6      7     8.0    9.0
7      8     9.0   10.0
8      9    10.0    NaN
9     10     NaN    NaN

At the end you can remove two last rows with NaN and convert all to integer

df = df[:-2].astype(int)

or if you don't have NaN in other places

df = df.dropna().astype(int)

Result:

   first  second  third
0      1       2      3
1      2       3      4
2      3       4      5
3      4       5      6
4      5       6      7
5      6       7      8
6      7       8      9
7      8       9     10

Minimal working code

text = """first
1
2
3
4
5
6
7
8
9
10"""

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['first'])
print(df)

df['second'] = df['first'].shift(-1) #, fill_value=0)
df['third'] = df['first'].shift(-2)
print(df)

#df = df.dropna().astype(int)
df = df[:-2].astype(int)
print(df)

EDIT:

The same using for-loop to create any number of columns

text = """col 1
1
2
3
4
5
6
7
8
9
10"""

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['col 1'])
print(df)

number = 5

for x in range(1, number+1):
    df[f'col {x+1}'] = df['col 1'].shift(-x)
print(df)

#df = df.dropna().astype(int)
df = df[:-number].astype(int)
print(df)

Result

   col 1
0      1
1      2
2      3
3      4
4      5
5      6
6      7
7      8
8      9
9     10

   col 1  col 2  col 3  col 4  col 5  col 6
0      1    2.0    3.0    4.0    5.0    6.0
1      2    3.0    4.0    5.0    6.0    7.0
2      3    4.0    5.0    6.0    7.0    8.0
3      4    5.0    6.0    7.0    8.0    9.0
4      5    6.0    7.0    8.0    9.0   10.0
5      6    7.0    8.0    9.0   10.0    NaN
6      7    8.0    9.0   10.0    NaN    NaN
7      8    9.0   10.0    NaN    NaN    NaN
8      9   10.0    NaN    NaN    NaN    NaN
9     10    NaN    NaN    NaN    NaN    NaN

   col 1  col 2  col 3  col 4  col 5  col 6
0      1      2      3      4      5      6
1      2      3      4      5      6      7
2      3      4      5      6      7      8
3      4      5      6      7      8      9
4      5      6      7      8      9     10

edited May 08 '21 at 08:51

answered May 08 '21 at 07:16

furas

134,197
12
106
148

However, what if I have the CSV file value for example: if the value of 1 2 3 4 5 6 7 8 9 10, is a random number? how should I execute it? @furas – NJM May 08 '21 at 08:27
As I wrote in answer - you should read it from CSV - I used `range()` only to simulate some data. – furas May 08 '21 at 08:44
now I used `read_csv()` with `io.StingIO()` to simulate some data - but you should use `read_csv(filename)` – furas May 08 '21 at 08:52
Should I use sliding window or rolling if my value from the previous csv file is: 1 2 3 4 5 6 7 8 9 10 and changes to >> for example 0.1 0.98 0.88 0.976? – NJM May 08 '21 at 09:31
it doesn't matter what values you have. But I saw your screenshot which shows values in row - so question is if you want to duplicate column or row. I assumed that you have values in column and you want to create new columns. But this method will not work if you want to create new rows. – furas May 08 '21 at 09:37
The column as you can see from my screenshot was 50, 60, 70.. this can be ignored actually, but if I have a value of 0.1 0.98 0.88 0.976 instead of 1 2 3 4, can it be done as what you answer previously? because the 1 2 3 4... is in sequence and it can use the range function, while the data I have in a row is different value? – NJM May 08 '21 at 09:51
as I said before - it doesn't matter what values you have. I used range only to generate some data for example - I could use `random` for this or put some numbers from my head (even `0.1 0.98 0.88 0.976`) but `range` was simpler to use. But now my answer use `read_csv` and it doesn't matter if you have file with `1,2,3` or `0.1 0.98 0.88 0.976` or even with strings `"hello, world, of, python`. Simply try to use it instead of asking if it will works with other values. – furas May 08 '21 at 10:15
Frankly, if you would put values `0.1 0.98 0.88 0.976` as text in your question then I would use them - I would simply copy-paste them - but you put as example `1 2 3` so I used `1 2 3` as example. Next time put some real values in question and then people will use this values in code. – furas May 08 '21 at 10:21
Thank you furas for the answer! My bad that I use the example in the wrong way :( – NJM May 08 '21 at 10:31
Hi furas i cant drop the last 2 rows with my existing data that contains float dataframes – NJM May 10 '21 at 03:08
it doesn't matter what values you have - it should be `df = df[:-2]` – furas May 10 '21 at 04:11
Hi furas, I wanted to include in column 4 either "0" or "1" if the last row of column 1 is less than 0.85 in my data frames. then my last 6 rows will be at "1" at column 4, and the rest of it will be 0. Whereby if my last row is more than 0.85, the rest of column 4 will at "0". – NJM May 15 '21 at 02:06
create new question on new page and you will have more space for descripton and example - before and after. I don't understand your problem. – furas May 15 '21 at 02:24
Ok I already posted a new one – NJM May 15 '21 at 10:12

Jupyter Notebook for csv file to select 3 window rolling

1 Answers1