0

I have data like this. What I am trying to do is to create a rule, based on domain names for my project. I want to create a new column named new_url based on domains. If it contains .cdn. it will take the string before .cdn. , otherwise it will call url parser library and parse url in another way. The problem is that in the csv file I created (cleanurl.csv) , there is no new_url column created. When I print parsed urls in code, I can see them. If and else condition are working. Could you help me please ?

enter image description here

import pandas as pd 
import url_parser
from url_parser import parse_url,get_url,get_base_url
import numpy as np 

df = pd.read_csv("C:\\Users\\myuser\\Desktop\\raw_data.csv", sep=';')

i=-1
for x in df['domain']:

    i=i+1
    print("*",x,"*") 

    if '.cdn.' in x:
        parsed_url=x.split('.cdn')[0]
        print(parsed_url)
        df.iloc[i]['new_url']=parsed_url
       
    else:
        parsed_url=get_url(x).domain +'.' + get_url(x).top_domain
        print(parsed_url)
        df.iloc[i]['new_url']=parsed_url

df.to_csv("C:\\Users\\myuser\\Desktop\\cleanurl.csv", sep=';')
medium-dimensional
  • 1,974
  • 10
  • 19
asdfg
  • 89
  • 4
  • 17
  • Please [do not post images of data](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors-when-asking-a-question): see [how to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). I have formatted your code because it lacked proper indentation. Please make sure the formatted code is reproducing the issue. – medium-dimensional Oct 26 '22 at 07:02

1 Answers1

1

Use .loc[row, 'column'] to create new column

for idx, x in df['domain'].items():
    if '.cdn.' in x:
        df.loc[idx, 'new_url'] = parsed_url
    else:
        df.loc[idx, 'new_url'] = parsed_url
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52